linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] shared subtree
       [not found] <1120816072.30164.10.camel@localhost>
@ 2005-07-08 10:25 ` Ram
       [not found] ` <1120816229.30164.13.camel@localhost>
  1 sibling, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

I am enclosing 8 patches that implement shared subtree functionality as
detailed in Al Viro's RFC found at http://lwn.net/Articles/119232/

The incremental patches provide the following functionality:

1) shared_private_slave.patch : Provides the ability to mark a subtree
as shared or private or slave.
	
2) unclone.patch : provides the ability to mark a subtree as unclonable.
NOTE: this feature is an addition to Al Viro's RFC, to solve the
vfsmount explosion. The problem is  detailed here:  
	http://www.ussg.iu.edu/hypermail/linux/kernel/0502.0/0468.html

3) rbind.patch : this patch adds the ability to propogate binds/rbinds
across vfsmounts.

4) move.patch : this patch provides the ability to move a
shared/private/slave/unclonable subtree to some other mountpoint. It
also provides the same feature to pivot_root()

5) umount.patch: this patch provides the ability to propogate unmounts.

6) namespace.patch: this patch provides ability to clone a namespace,
with propogation set to vfsmounts in the new namespace.

7) automount.patch: this patch provides the automatic propogation for
mounts/unmounts done through automounter.

8) pnode_opt.patch: this patch optimizes the redundent code in pnode.c .


I have unit tested the code and it mostly works. However i am still
debugging some race conditions.

Also have enclosed a mount command source (initially written by Miklos)
that helps to create shared/private/unclone/slave trees. 


Thanks to Mike Waychison, Miklos Szeredi, J. Bruce Fields for helping
out with the various scenarios and initial comments on the code. And to
Al Viro for silently listening to our conversation.

Looking forward towards lots of feedback and comments,
RP

----------------------------------------------------------------------------------------

//
//this code was developed my Miklos Szeredi <miklos@szeredi.hu>
//and modified by Ram Pai <linuxram@us.ibm.com>
// sample usage:
// 		newmount /tmp shared
//
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mount.h>
#include <sys/fsuid.h>

int main(int argc, char *argv[])
{
	int type;
    if(argc != 3) {
        fprintf(stderr, "usage: %s dir
<shared|slave|private|unclone>\n", argv[0]);
        return 1;
    }

     fprintf(stdout, "%s %s %s\n", argv[0], argv[1], argv[2]);

    if (strcmp(argv[2],"shared")==0)
	    type=1048576;
    else if (strcmp(argv[2],"slave")==0)
	    type=524288;
    else if (strcmp(argv[2],"private")==0)
	    type=262144;
    else if (strcmp(argv[2],"unclone")==0)
	    type=131072;
    else {
        fprintf(stderr, "invalid operation: %s\n", argv[2]);
        return 1;
    }

    setfsuid(getuid());
    if(mount("", argv[1], "ext2", type, "") == -1) {
        perror("mount");
        return 1;
    }
    return 0;
}

----------------------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 1/8] share/private/slave a subtree
       [not found] ` <1120816229.30164.13.camel@localhost>
@ 2005-07-08 10:25   ` Ram
  2005-07-08 11:17     ` Pekka Enberg
  2005-07-08 14:32     ` Miklos Szeredi
       [not found]   ` <1120816355.30164.16.camel@localhost>
  1 sibling, 2 replies; 48+ messages in thread
From: Ram @ 2005-07-08 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 69 bytes --]


This patch adds the shared/private/slave support for VFS trees.

RP

[-- Attachment #2: shared_private_slave.patch --]
[-- Type: text/x-patch, Size: 24162 bytes --]

This patch adds the shared/private/slave support for VFS trees.

Signed by Ram Pai (linuxram@us.ibm.com)

 fs/Makefile            |    2 
 fs/dcache.c            |    2 
 fs/namei.c             |    4 
 fs/namespace.c         |  170 ++++++++++++++++++++++++
 fs/pnode.c             |  335 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |    2 
 include/linux/fs.h     |    5 
 include/linux/mount.h  |   48 ++++++-
 include/linux/pnode.h  |   82 +++++++++++
 9 files changed, 641 insertions(+), 9 deletions(-)

Index: 2.6.12/fs/namespace.c
===================================================================
--- 2.6.12.orig/fs/namespace.c
+++ 2.6.12/fs/namespace.c
@@ -22,6 +22,7 @@
 #include <linux/namei.h>
 #include <linux/security.h>
 #include <linux/mount.h>
+#include <linux/pnode.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 
@@ -62,6 +63,7 @@ struct vfsmount *alloc_vfsmnt(const char
 		INIT_LIST_HEAD(&mnt->mnt_mounts);
 		INIT_LIST_HEAD(&mnt->mnt_list);
 		INIT_LIST_HEAD(&mnt->mnt_fslink);
+		INIT_LIST_HEAD(&mnt->mnt_pnode_mntlist);
 		if (name) {
 			int size = strlen(name)+1;
 			char *newname = kmalloc(size, GFP_KERNEL);
@@ -84,7 +86,7 @@ void free_vfsmnt(struct vfsmount *mnt)
  * Now, lookup_mnt increments the ref count before returning
  * the vfsmount struct.
  */
-struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
+struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry, struct dentry *root)
 {
 	struct list_head * head = mount_hashtable + hash(mnt, dentry);
 	struct list_head * tmp = head;
@@ -97,7 +99,8 @@ struct vfsmount *lookup_mnt(struct vfsmo
 		if (tmp == head)
 			break;
 		p = list_entry(tmp, struct vfsmount, mnt_hash);
-		if (p->mnt_parent == mnt && p->mnt_mountpoint == dentry) {
+		if (p->mnt_parent == mnt && p->mnt_mountpoint == dentry &&
+				(root == NULL || p->mnt_root == root )) {
 			found = mntget(p);
 			break;
 		}
@@ -119,6 +122,7 @@ static void detach_mnt(struct vfsmount *
 	mnt->mnt_mountpoint = mnt->mnt_root;
 	list_del_init(&mnt->mnt_child);
 	list_del_init(&mnt->mnt_hash);
+	list_del_init(&mnt->mnt_pnode_mntlist);
 	old_nd->dentry->d_mounted--;
 }
 
@@ -161,6 +165,7 @@ clone_mnt(struct vfsmount *old, struct d
 		mnt->mnt_mountpoint = mnt->mnt_root;
 		mnt->mnt_parent = mnt;
 		mnt->mnt_namespace = old->mnt_namespace;
+		mnt->mnt_pnode = get_pnode(old->mnt_pnode);
 
 		/* stick the duplicate mount on the same expiry list
 		 * as the original if that was on one */
@@ -176,6 +181,7 @@ void __mntput(struct vfsmount *mnt)
 {
 	struct super_block *sb = mnt->mnt_sb;
 	dput(mnt->mnt_root);
+	put_pnode(mnt->mnt_pnode);
 	free_vfsmnt(mnt);
 	deactivate_super(sb);
 }
@@ -615,6 +621,148 @@ out_unlock:
 	return err;
 }
 
+struct vfsmount *do_make_mounted(struct vfsmount *mnt, struct dentry *dentry)
+{
+	struct vfsmount *child_mnt, *next;
+	struct nameidata nd;
+	struct vfsmount *newmnt = clone_mnt(mnt, dentry);
+	LIST_HEAD(move);
+
+	if (newmnt) {
+		/* put in code for mount expiry */
+		/* TOBEDONE */
+
+		/*
+		 * walk through the mount list of mnt and move
+		 * them under the new mount
+		 */
+		spin_lock(&vfsmount_lock);
+		list_for_each_entry_safe(child_mnt, next,
+				&mnt->mnt_mounts, mnt_child) {
+
+			if(child_mnt->mnt_mountpoint == dentry)
+				continue;
+
+			if(!is_subdir(child_mnt->mnt_mountpoint, dentry))
+				continue;
+
+			detach_mnt(child_mnt, &nd);
+			nd.mnt = newmnt;
+			attach_mnt(child_mnt, &nd);
+		}
+
+		nd.mnt = mnt;
+		nd.dentry = dentry;
+		attach_mnt(newmnt, &nd);
+ 		spin_unlock(&vfsmount_lock);
+ 	}
+	return newmnt;
+}
+
+int
+_do_make_mounted(struct nameidata *nd, struct vfsmount **mnt)
+{
+	struct vfsmount *parent_mnt;
+	struct dentry *parent_dentry;
+	int err = mount_is_safe(nd);
+	if (err)
+		return err;
+	parent_dentry = nd->dentry;
+	parent_mnt = nd->mnt;
+ 	/*
+	 * check if dentry already has a vfsmount
+	 * if it does not, create a vfsmount there.
+	 * which means you need to propogate it
+	 * across all vfsmounts.
+ 	 */
+	if(parent_dentry == parent_mnt->mnt_root) {
+		*mnt = parent_mnt;
+	} else {
+		*mnt = IS_MNT_SHARED(parent_mnt) ?
+			 pnode_make_mounted(parent_mnt->mnt_pnode,
+					 parent_mnt, parent_dentry) :
+			 do_make_mounted(parent_mnt, parent_dentry);
+		if (!*mnt)
+			err = -ENOMEM;
+ 	}
+	return err;
+}
+
+/*
+ * recursively change the type of the mountpoint.
+ */
+static int do_change_type(struct nameidata *nd, int flag)
+{
+	struct vfsmount *m, *mnt;
+	struct vfspnode *old_pnode = NULL;
+	int err;
+
+	if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
+			&& !(flag & MS_SLAVE))
+		return -EINVAL;
+
+	if ((err = _do_make_mounted(nd, &mnt)))
+		return err;
+
+	spin_lock(&vfsmount_lock);
+	for (m = mnt; m; m = next_mnt(m, mnt)) {
+		switch (flag) {
+		case MS_SHARED:
+			/*
+			 * if the mount is already a slave mount,
+			 * allocated a new pnode and make it
+			 * a slave pnode of the original pnode.
+			 */
+			if (IS_MNT_SLAVE(m)) {
+				old_pnode = m->mnt_pnode;
+				pnode_del_slave_mnt(m);
+			}
+			if(!IS_MNT_SHARED(m)) {
+				m->mnt_pnode = pnode_alloc();
+				if(!m->mnt_pnode) {
+					pnode_add_slave_mnt(old_pnode, m);
+					err = -ENOMEM;
+					break;
+				}
+				pnode_add_member_mnt(m->mnt_pnode, m);
+			}
+			if(old_pnode) {
+				pnode_add_slave_pnode(old_pnode,
+						m->mnt_pnode);
+			}
+			SET_MNT_SHARED(m);
+			break;
+
+		case MS_SLAVE:
+			if (IS_MNT_SLAVE(m)) {
+				break;
+			}
+			/*
+			 * only shared mounts can
+			 * be made slave
+			 */
+			if (!IS_MNT_SHARED(m)) {
+				err = -EINVAL;
+				break;
+			}
+			old_pnode = m->mnt_pnode;
+			pnode_del_member_mnt(m);
+			pnode_add_slave_mnt(old_pnode, m);
+			SET_MNT_SLAVE(m);
+			break;
+
+		case MS_PRIVATE:
+			if(m->mnt_pnode)
+				pnode_disassociate_mnt(m);
+			SET_MNT_PRIVATE(m);
+			break;
+
+		}
+	}
+	spin_unlock(&vfsmount_lock);
+	return err;
+}
+
 /*
  * do loopback mount.
  */
@@ -1049,6 +1197,8 @@ long do_mount(char * dev_name, char * di
 				    data_page);
 	else if (flags & MS_BIND)
 		retval = do_loopback(&nd, dev_name, flags & MS_REC);
+	else if (flags & MS_SHARED || flags & MS_PRIVATE || flags & MS_SLAVE)
+		retval = do_change_type(&nd, flags);
 	else if (flags & MS_MOVE)
 		retval = do_move_mount(&nd, dev_name);
 	else
Index: 2.6.12/fs/pnode.c
===================================================================
--- /dev/null
+++ 2.6.12/fs/pnode.c
@@ -0,0 +1,362 @@
+/*
+ *  linux/fs/pnode.c
+ *
+ * (C) Copyright IBM Corporation 2005.
+ *	Released under GPL v2.
+ *	Author : Ram Pai (linuxram@us.ibm.com)
+ *
+ */
+
+#include <linux/config.h>
+#include <linux/syscalls.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/smp_lock.h>
+#include <linux/init.h>
+#include <linux/quotaops.h>
+#include <linux/acct.h>
+#include <linux/module.h>
+#include <linux/seq_file.h>
+#include <linux/namespace.h>
+#include <linux/namei.h>
+#include <linux/security.h>
+#include <linux/mount.h>
+#include <linux/pnode.h>
+#include <asm/uaccess.h>
+#include <asm/unistd.h>
+#include <stdarg.h>
+
+
+#define PNODE_MEMBER_VFS  0x01
+#define PNODE_SLAVE_VFS   0x02
+
+static kmem_cache_t * pnode_cachep;
+
+/* spinlock for pnode related operations */
+ __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfspnode_lock);
+
+
+static void
+pnode_init_fn(void *data, kmem_cache_t *cachep, unsigned long flags)
+{
+	struct vfspnode *pnode = (struct vfspnode *)data;
+	INIT_LIST_HEAD(&pnode->pnode_vfs);
+	INIT_LIST_HEAD(&pnode->pnode_slavevfs);
+	INIT_LIST_HEAD(&pnode->pnode_slavepnode);
+	INIT_LIST_HEAD(&pnode->pnode_peer_slave);
+	pnode->pnode_master = NULL;
+	pnode->pnode_flags = 0;
+	atomic_set(&pnode->pnode_count,0);
+}
+
+void __init
+pnode_init(unsigned long mempages)
+{
+	pnode_cachep = kmem_cache_create("pnode_cache",
+                       sizeof(struct vfspnode), 0,
+                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, pnode_init_fn, NULL);
+}
+
+
+struct vfspnode *
+pnode_alloc(void)
+{
+	struct vfspnode *pnode =  (struct vfspnode *)kmem_cache_alloc(
+			pnode_cachep, GFP_KERNEL);
+	return pnode;
+}
+
+void
+pnode_free(struct vfspnode *pnode)
+{
+	kmem_cache_free(pnode_cachep, pnode);
+}
+
+/*
+ * __put_pnode() should be called with vfspnode_lock held
+ */
+void
+__put_pnode(struct vfspnode *pnode)
+{
+	struct vfspnode *tmp_pnode;
+	do {
+		tmp_pnode = pnode->pnode_master;
+		list_del_init(&pnode->pnode_peer_slave);
+		BUG_ON(!list_empty(&pnode->pnode_vfs));
+		BUG_ON(!list_empty(&pnode->pnode_slavevfs));
+		BUG_ON(!list_empty(&pnode->pnode_slavepnode));
+		pnode_free(pnode);
+		pnode = tmp_pnode;
+		if (!pnode || !atomic_dec_and_test(&pnode->pnode_count))
+			break;
+	} while(pnode);
+}
+
+struct inoutdata {
+	void *my_data; /* produced and consumed by me */
+	void *in_data; /* produced by master, consumed by slave */
+	void *out_data; /* produced by slave, comsume by master */
+};
+
+struct pcontext {
+	struct vfspnode *start;
+	int 	flag;
+	int 	traversal;
+	int	level;
+	struct vfspnode *master_pnode;
+	struct vfspnode *pnode;
+	struct vfspnode *slave_pnode;
+};
+
+
+#define PNODE_UP 1
+#define PNODE_DOWN 2
+#define PNODE_MID 3
+
+/*
+ * Walk the pnode tree for each pnode encountered.  A given pnode in the tree
+ * can be returned a minimum of 2 times.  First time the pnode is encountered,
+ * it is returned with the flag PNODE_DOWN. Everytime the pnode is encountered
+ * after having traversed through each of its children, it is returned with the
+ * flag PNODE_MID.  And finally when the pnode is encountered after having
+ * walked all of its children, it is returned with the flag PNODE_UP.
+ *
+ * @context: provides context on the state of the last walk in the pnode
+ * 		tree.
+ */
+static int inline
+pnode_next(struct pcontext *context)
+{
+	int traversal = context->traversal;
+	int ret=0;
+	struct vfspnode *pnode = context->pnode,
+			*slave_pnode=context->slave_pnode,
+			*master_pnode=context->master_pnode;
+	struct list_head *next;
+
+	spin_lock(&vfspnode_lock);
+	/*
+	 * the first traversal will be on the root pnode
+	 * with flag PNODE_DOWN
+	 */
+	if (!pnode) {
+		context->pnode = get_pnode(context->start);
+		context->master_pnode = NULL;
+		context->traversal = PNODE_DOWN;
+		context->slave_pnode = NULL;
+		context->level = 0;
+		ret = 1;
+		goto out;
+	}
+
+	/*
+	 * if the last traversal was PNODE_UP, than the current
+	 * traversal is PNODE_MID on the master pnode.
+	 */
+	if (traversal == PNODE_UP) {
+		if (!master_pnode) {
+			/* DONE. return */
+			put_pnode(pnode);
+			ret = 0;
+		} else {
+			context->traversal = PNODE_MID;
+			context->level--;
+			context->pnode = master_pnode;
+			put_pnode(slave_pnode);
+			context->slave_pnode = pnode;
+			context->master_pnode = (master_pnode ?
+				master_pnode->pnode_master : NULL);
+			ret = 1;
+		}
+	} else {
+		if(traversal == PNODE_MID) {
+			next = slave_pnode->pnode_peer_slave.next;
+		} else {
+			next = pnode->pnode_slavepnode.next;
+		}
+		put_pnode(slave_pnode);
+		context->slave_pnode = NULL;
+		/*
+		 * if the last traversal was PNODE_MID or PNODE_DOWN, and the
+		 * master pnode has some slaves to traverse, the current
+		 * traversal will be PNODE_DOWN on the slave pnode.
+		 */
+		if ((next != &pnode->pnode_slavepnode) &&
+			(traversal == PNODE_DOWN || traversal == PNODE_MID)) {
+			context->traversal = PNODE_DOWN;
+			context->level++;
+			context->pnode = get_pnode(list_entry(next,
+				struct vfspnode, pnode_peer_slave));
+			context->master_pnode = pnode;
+			ret = 1;
+		} else {
+			/*
+			 * since there are no more children, the current traversal
+			 * is PNODE_UP on the same pnode
+			 */
+			context->traversal = PNODE_UP;
+			ret = 1;
+		}
+	}
+out:
+	spin_unlock(&vfspnode_lock);
+	return ret;
+}
+
+
+static void inline
+pnode_add_mnt(struct vfspnode *pnode, struct vfsmount *mnt, int slave)
+{
+	if (!pnode || !mnt)
+		return;
+	spin_lock(&vfspnode_lock);
+	mnt->mnt_pnode = pnode;
+	if (slave) {
+		SET_MNT_SLAVE(mnt);
+		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_slavevfs);
+	} else {
+		SET_MNT_SHARED(mnt);
+		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_vfs);
+	}
+	get_pnode(pnode);
+	spin_unlock(&vfspnode_lock);
+}
+
+void
+pnode_add_member_mnt(struct vfspnode *pnode, struct vfsmount *mnt)
+{
+	pnode_add_mnt(pnode, mnt, 0);
+}
+
+void
+pnode_add_slave_mnt(struct vfspnode *pnode, struct vfsmount *mnt)
+{
+	pnode_add_mnt(pnode, mnt, 1);
+}
+
+
+void
+pnode_add_slave_pnode(struct vfspnode *pnode, struct vfspnode *slave_pnode)
+{
+	if (!pnode || !slave_pnode)
+		return;
+	spin_lock(&vfspnode_lock);
+	slave_pnode->pnode_master = pnode;
+	slave_pnode->pnode_flags = 0;
+	list_add(&slave_pnode->pnode_peer_slave, &pnode->pnode_slavepnode);
+	get_pnode(pnode);
+	spin_unlock(&vfspnode_lock);
+}
+
+static void
+_pnode_disassociate_mnt(struct vfsmount *mnt)
+{
+	spin_lock(&vfspnode_lock);
+	list_del_init(&mnt->mnt_pnode_mntlist);
+	spin_unlock(&vfspnode_lock);
+	put_pnode(mnt->mnt_pnode);
+	mnt->mnt_pnode = NULL;
+}
+
+void
+pnode_del_slave_mnt(struct vfsmount *mnt)
+{
+	if (!mnt)
+		return;
+ 	_pnode_disassociate_mnt(mnt);
+	mnt->mnt_flags &= ~MNT_SLAVE;
+}
+
+void
+pnode_del_member_mnt(struct vfsmount *mnt)
+{
+	if (!mnt)
+		return;
+ 	_pnode_disassociate_mnt(mnt);
+	mnt->mnt_flags &= ~MNT_SHARED;
+}
+
+
+void
+pnode_disassociate_mnt(struct vfsmount *mnt)
+{
+	if (!mnt)
+		return;
+ 	_pnode_disassociate_mnt(mnt);
+	mnt->mnt_flags &= ~MNT_SLAVE;
+	mnt->mnt_flags &= ~MNT_SHARED;
+}
+
+struct vfsmount *
+pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
+{
+	struct vfsmount *child_mnt;
+	int ret=0,traversal,level;
+     	struct vfspnode *slave_pnode, *master_pnode, *child_pnode, *slave_child_pnode;
+     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
+	struct pcontext context;
+	static struct inoutdata p_array[PNODE_MAX_SLAVE_LEVEL];
+
+	context.start = pnode;
+	context.pnode = NULL;
+
+	while(pnode_next(&context)) {
+		traversal = context.traversal;
+		level = context.level;
+		pnode = context.pnode;
+		slave_pnode = context.slave_pnode;
+		master_pnode = context.master_pnode;
+
+		if (traversal == PNODE_DOWN ) {
+			if (master_pnode) {
+				child_pnode = (struct vfspnode *)p_array[level-1].in_data;
+			} else {
+				child_pnode = NULL;
+			}
+			while (!(child_pnode = pnode_alloc()))
+				schedule();
+			p_array[level].my_data = (void *)child_pnode;
+			p_array[level].in_data = NULL;
+
+		} else if(traversal == PNODE_MID) {
+
+			child_pnode = (struct vfspnode *)p_array[level].my_data;
+			slave_child_pnode = p_array[level+1].out_data;
+			pnode_add_slave_pnode(child_pnode, slave_child_pnode);
+
+		} else if(traversal == PNODE_UP) {
+			child_pnode = p_array[level].my_data;
+			spin_lock(&vfspnode_lock);
+			list_for_each_entry_safe(member_mnt,
+				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if (!(child_mnt = do_make_mounted(mnt, dentry))) {
+					ret = -ENOMEM;
+					goto out;
+				}
+				spin_lock(&vfspnode_lock);
+				pnode_add_member_mnt(child_pnode, child_mnt);
+			}
+			list_for_each_entry_safe(slave_mnt,
+				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if (!(child_mnt = do_make_mounted(mnt, dentry))) {
+					ret = -ENOMEM;
+					goto out;
+				}
+				spin_lock(&vfspnode_lock);
+				pnode_add_slave_mnt(child_pnode, child_mnt);
+			}
+			spin_unlock(&vfspnode_lock);
+			p_array[level].out_data = child_pnode;
+		}
+	}
+
+out:
+	if (ret)
+		return NULL;
+
+	child_mnt = lookup_mnt(mnt, dentry, dentry);
+	mntput(child_mnt);
+	return child_mnt;
+}
Index: 2.6.12/fs/dcache.c
===================================================================
--- 2.6.12.orig/fs/dcache.c
+++ 2.6.12/fs/dcache.c
@@ -27,6 +27,7 @@
 #include <linux/module.h>
 #include <linux/mount.h>
 #include <linux/file.h>
+#include <linux/pnode.h>
 #include <asm/uaccess.h>
 #include <linux/security.h>
 #include <linux/seqlock.h>
@@ -1737,6 +1738,7 @@ void __init vfs_caches_init(unsigned lon
 	inode_init(mempages);
 	files_init(mempages);
 	mnt_init(mempages);
+	pnode_init(mempages);
 	bdev_cache_init();
 	chrdev_init();
 }
Index: 2.6.12/fs/namei.c
===================================================================
--- 2.6.12.orig/fs/namei.c
+++ 2.6.12/fs/namei.c
@@ -583,7 +583,7 @@ static int __follow_mount(struct path *p
 {
 	int res = 0;
 	while (d_mountpoint(path->dentry)) {
-		struct vfsmount *mounted = lookup_mnt(path->mnt, path->dentry);
+		struct vfsmount *mounted = lookup_mnt(path->mnt, path->dentry, NULL);
 		if (!mounted)
 			break;
 		dput(path->dentry);
@@ -599,7 +599,7 @@ static int __follow_mount(struct path *p
 static void follow_mount(struct vfsmount **mnt, struct dentry **dentry)
 {
 	while (d_mountpoint(*dentry)) {
-		struct vfsmount *mounted = lookup_mnt(*mnt, *dentry);
+		struct vfsmount *mounted = lookup_mnt(*mnt, *dentry, NULL);
 		if (!mounted)
 			break;
 		dput(*dentry);
@@ -616,7 +616,7 @@ int follow_down(struct vfsmount **mnt, s
 {
 	struct vfsmount *mounted;
 
-	mounted = lookup_mnt(*mnt, *dentry);
+	mounted = lookup_mnt(*mnt, *dentry, NULL);
 	if (mounted) {
 		dput(*dentry);
 		mntput(*mnt);
Index: 2.6.12/include/linux/fs.h
===================================================================
--- 2.6.12.orig/include/linux/fs.h
+++ 2.6.12/include/linux/fs.h
@@ -102,6 +102,9 @@ extern int dir_notify_enable;
 #define MS_MOVE		8192
 #define MS_REC		16384
 #define MS_VERBOSE	32768
+#define MS_PRIVATE	262144
+#define MS_SLAVE	524288
+#define MS_SHARED	1048576
 #define MS_POSIXACL	(1<<16)	/* VFS does not apply the umask */
 #define MS_ACTIVE	(1<<30)
 #define MS_NOUSER	(1<<31)
@@ -232,6 +235,7 @@ extern void update_atime (struct inode *
 extern void __init inode_init(unsigned long);
 extern void __init inode_init_early(void);
 extern void __init mnt_init(unsigned long);
+extern void __init pnode_init(unsigned long);
 extern void __init files_init(unsigned long);
 
 struct buffer_head;
@@ -1211,6 +1215,7 @@ extern struct vfsmount *kern_mount(struc
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 extern long do_mount(char *, char *, char *, unsigned long, void *);
+extern struct vfsmount *do_make_mounted(struct vfsmount *, struct dentry *);
 
 extern int vfs_statfs(struct super_block *, struct kstatfs *);
 
Index: 2.6.12/include/linux/dcache.h
===================================================================
--- 2.6.12.orig/include/linux/dcache.h
+++ 2.6.12/include/linux/dcache.h
@@ -328,7 +328,7 @@ static inline int d_mountpoint(struct de
 	return dentry->d_mounted;
 }
 
-extern struct vfsmount *lookup_mnt(struct vfsmount *, struct dentry *);
+extern struct vfsmount *lookup_mnt(struct vfsmount *, struct dentry *, struct dentry *);
 extern struct dentry *lookup_create(struct nameidata *nd, int is_dir);
 
 extern int sysctl_vfs_cache_pressure;
Index: 2.6.12/include/linux/pnode.h
===================================================================
--- /dev/null
+++ 2.6.12/include/linux/pnode.h
@@ -0,0 +1,78 @@
+/*
+ *  linux/fs/pnode.c
+ *
+ * (C) Copyright IBM Corporation 2005.
+ *	Released under GPL v2.
+ *
+ */
+#ifndef _LINUX_PNODE_H
+#define _LINUX_PNODE_H
+#ifdef __KERNEL__
+
+#include <linux/list.h>
+#include <linux/mount.h>
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+
+struct vfspnode {
+	struct list_head pnode_vfs; 	 /* list of vfsmounts anchored here */
+	struct list_head pnode_slavevfs; /* list of slave vfsmounts */
+	struct list_head pnode_slavepnode;/* list of slave pnode */
+	struct list_head pnode_peer_slave;/* going through master's slave pnode
+					    list*/
+	struct vfspnode	 *pnode_master;	  /* master pnode */
+	int 		 pnode_flags;
+	atomic_t 	 pnode_count;
+};
+#define PNODE_MAX_SLAVE_LEVEL 10
+#define PNODE_DELETE  0x01
+#define PNODE_SLAVE   0x02
+
+#define IS_PNODE_DELETE(pn)  ((pn->pnode_flags&PNODE_DELETE)==PNODE_DELETE)
+#define IS_PNODE_SLAVE(pn)  ((pn->pnode_flags&PNODE_SLAVE)==PNODE_SLAVE)
+#define SET_PNODE_DELETE(pn)  pn->pnode_flags |= PNODE_DELETE
+#define SET_PNODE_SLAVE(pn)  pn->pnode_flags |= PNODE_SLAVE
+
+extern spinlock_t vfspnode_lock;
+extern void __put_pnode(struct vfspnode *);
+
+static inline struct vfspnode *
+get_pnode(struct vfspnode *pnode)
+{
+	if (!pnode)
+		return NULL;
+	atomic_inc(&pnode->pnode_count);
+	return pnode;
+}
+
+static inline struct vfspnode *
+get_pnode_n(struct vfspnode *pnode, size_t n)
+{
+	if (!pnode)
+		return NULL;
+	atomic_add(n, &pnode->pnode_count);
+	return pnode;
+}
+
+static inline void
+put_pnode(struct vfspnode *pnode)
+{
+	if (!pnode)
+		return;
+	if (atomic_dec_and_lock(&pnode->pnode_count, &vfspnode_lock)) {
+		__put_pnode(pnode);
+		spin_unlock(&vfspnode_lock);
+	}
+}
+
+void __init pnode_init(unsigned long );
+struct vfspnode * pnode_alloc(void);
+void pnode_add_slave_mnt(struct vfspnode *, struct vfsmount *);
+void pnode_add_member_mnt(struct vfspnode *, struct vfsmount *);
+void pnode_del_slave_mnt(struct vfsmount *);
+void pnode_del_member_mnt(struct vfsmount *);
+void pnode_disassociate_mnt(struct vfsmount *);
+void pnode_add_slave_pnode(struct vfspnode *, struct vfspnode *);
+struct vfsmount * pnode_make_mounted(struct vfspnode *, struct vfsmount *, struct dentry *);
+#endif /* KERNEL */
+#endif /* _LINUX_PNODE_H */
Index: 2.6.12/include/linux/mount.h
===================================================================
--- 2.6.12.orig/include/linux/mount.h
+++ 2.6.12/include/linux/mount.h
@@ -16,9 +16,33 @@
 #include <linux/spinlock.h>
 #include <asm/atomic.h>
 
-#define MNT_NOSUID	1
-#define MNT_NODEV	2
-#define MNT_NOEXEC	4
+#define MNT_NOSUID	0x01
+#define MNT_NODEV	0x02
+#define MNT_NOEXEC	0x04
+#define MNT_PRIVATE	0x10  /* if the vfsmount is private, by default it is private*/
+#define MNT_SLAVE	0x20  /* if the vfsmount is a slave mount of its pnode */
+#define MNT_SHARED	0x40  /* if the vfsmount is a slave mount of its pnode */
+#define MNT_PNODE_MASK	0xf0  /* propogation flag mask */
+
+#define IS_MNT_SHARED(mnt) (mnt->mnt_flags&MNT_SHARED)
+#define IS_MNT_SLAVE(mnt) (mnt->mnt_flags&MNT_SLAVE)
+#define IS_MNT_PRIVATE(mnt) (mnt->mnt_flags&MNT_PRIVATE)
+
+#define CLEAR_MNT_SHARED(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SHARED))
+#define CLEAR_MNT_PRIVATE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_PRIVATE))
+#define CLEAR_MNT_SLAVE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SLAVE))
+
+//TOBEDONE WRITE BETTER MACROS. ..
+#define SET_MNT_SHARED(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_SHARED),\
+				CLEAR_MNT_PRIVATE(mnt), \
+				CLEAR_MNT_SLAVE(mnt))
+#define SET_MNT_PRIVATE(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_PRIVATE), \
+				CLEAR_MNT_SLAVE(mnt), \
+				CLEAR_MNT_SHARED(mnt), \
+				mnt->mnt_pnode = NULL)
+#define SET_MNT_SLAVE(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_SLAVE), \
+				CLEAR_MNT_PRIVATE(mnt), \
+				CLEAR_MNT_SHARED(mnt))
 
 struct vfsmount
 {
@@ -29,6 +53,10 @@ struct vfsmount
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	struct list_head mnt_mounts;	/* list of children, anchored here */
 	struct list_head mnt_child;	/* and going through their mnt_child */
+	struct list_head mnt_pnode_mntlist;/* and going through their
+					   pnode's vfsmount */
+	struct vfspnode *mnt_pnode;	/* and going through their
+					   pnode's vfsmount */
 	atomic_t mnt_count;
 	int mnt_flags;
 	int mnt_expiry_mark;		/* true if marked for expiry */
@@ -38,6 +66,7 @@ struct vfsmount
 	struct namespace *mnt_namespace; /* containing namespace */
 };
 
+
 static inline struct vfsmount *mntget(struct vfsmount *mnt)
 {
 	if (mnt)
Index: 2.6.12/fs/Makefile
===================================================================
--- 2.6.12.orig/fs/Makefile
+++ 2.6.12/fs/Makefile
@@ -8,7 +8,7 @@
 obj-y :=	open.o read_write.o file_table.o buffer.o  bio.o super.o \
 		block_dev.o char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \
 		ioctl.o readdir.o select.o fifo.o locks.o dcache.o inode.o \
-		attr.o bad_inode.o file.o filesystems.o namespace.o aio.o \
+		attr.o bad_inode.o file.o filesystems.o namespace.o pnode.o aio.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o mpage.o direct-io.o \
 
 obj-$(CONFIG_EPOLL)		+= eventpoll.o

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 2/8] unclone a subtree
       [not found]   ` <1120816355.30164.16.camel@localhost>
@ 2005-07-08 10:25     ` Ram
       [not found]     ` <1120816436.30164.19.camel@localhost>
  1 sibling, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 147 bytes --]


Adds the ability to unclone a vfs tree. A uncloned vfs tree will not be
 clonnable, and hence cannot be bind/rbind to any other mountpoint.

 RP


[-- Attachment #2: unclone.patch --]
[-- Type: text/x-patch, Size: 3861 bytes --]

 Adds the ability to unclone a vfs tree. A uncloned vfs tree will not be
 clonnable, and hence cannot be bind/rbind to any other mountpoint.

 RP


 fs/namespace.c        |   18 +++++++++++++++++-
 include/linux/fs.h    |    1 +
 include/linux/mount.h |   15 +++++++++++++--
 3 files changed, 31 insertions(+), 3 deletions(-)

Index: 2.6.12/fs/namespace.c
===================================================================
--- 2.6.12.orig/fs/namespace.c
+++ 2.6.12/fs/namespace.c
@@ -698,6 +698,7 @@ static int do_change_type(struct nameida
 	int err;
 
 	if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
+			&& !(flag & MS_UNCLONE)
 			&& !(flag & MS_SLAVE))
 		return -EINVAL;
 
@@ -757,6 +758,11 @@ static int do_change_type(struct nameida
 			SET_MNT_PRIVATE(m);
 			break;
 
+		case MS_UNCLONE:
+			if(m->mnt_pnode)
+				pnode_disassociate_mnt(m);
+			SET_MNT_UNCLONABLE(m);
+			break;
 		}
 	}
 	spin_unlock(&vfsmount_lock);
@@ -1197,7 +1203,8 @@ long do_mount(char * dev_name, char * di
 				    data_page);
 	else if (flags & MS_BIND)
 		retval = do_loopback(&nd, dev_name, flags & MS_REC);
-	else if (flags & MS_SHARED || flags & MS_PRIVATE || flags & MS_SLAVE)
+	else if (flags & MS_SHARED || flags & MS_UNCLONE ||
+			flags & MS_PRIVATE || flags & MS_SLAVE)
 		retval = do_change_type(&nd, flags);
 	else if (flags & MS_MOVE)
 		retval = do_move_mount(&nd, dev_name);
Index: 2.6.12/include/linux/fs.h
===================================================================
--- 2.6.12.orig/include/linux/fs.h
+++ 2.6.12/include/linux/fs.h
@@ -102,6 +102,7 @@ extern int dir_notify_enable;
 #define MS_MOVE		8192
 #define MS_REC		16384
 #define MS_VERBOSE	32768
+#define MS_UNCLONE	131072
 #define MS_PRIVATE	262144
 #define MS_SLAVE	524288
 #define MS_SHARED	1048576
Index: 2.6.12/include/linux/mount.h
===================================================================
--- 2.6.12.orig/include/linux/mount.h
+++ 2.6.12/include/linux/mount.h
@@ -21,28 +21,39 @@
 #define MNT_NOEXEC	0x04
 #define MNT_PRIVATE	0x10  /* if the vfsmount is private, by default it is private*/
 #define MNT_SLAVE	0x20  /* if the vfsmount is a slave mount of its pnode */
-#define MNT_SHARED	0x40  /* if the vfsmount is a slave mount of its pnode */
+#define MNT_SHARED	0x40  /* if the vfsmount is a shared mount */
+#define MNT_UNCLONABLE	0x80  /* if the vfsmount is unclonable */
 #define MNT_PNODE_MASK	0xf0  /* propogation flag mask */
 
+#define IS_MNT_UNCLONABLE(mnt) (mnt->mnt_flags&MNT_UNCLONABLE)
 #define IS_MNT_SHARED(mnt) (mnt->mnt_flags&MNT_SHARED)
 #define IS_MNT_SLAVE(mnt) (mnt->mnt_flags&MNT_SLAVE)
 #define IS_MNT_PRIVATE(mnt) (mnt->mnt_flags&MNT_PRIVATE)
 
 #define CLEAR_MNT_SHARED(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SHARED))
 #define CLEAR_MNT_PRIVATE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_PRIVATE))
+#define CLEAR_MNT_UNCLONABLE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_UNCLONABLE))
 #define CLEAR_MNT_SLAVE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SLAVE))
 
 //TOBEDONE WRITE BETTER MACROS. ..
 #define SET_MNT_SHARED(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_SHARED),\
 				CLEAR_MNT_PRIVATE(mnt), \
-				CLEAR_MNT_SLAVE(mnt))
+				CLEAR_MNT_SLAVE(mnt), \
+				CLEAR_MNT_UNCLONABLE(mnt))
 #define SET_MNT_PRIVATE(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_PRIVATE), \
 				CLEAR_MNT_SLAVE(mnt), \
 				CLEAR_MNT_SHARED(mnt), \
+				CLEAR_MNT_UNCLONABLE(mnt),\
+				mnt->mnt_pnode = NULL)
+#define SET_MNT_UNCLONABLE(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_UNCLONABLE), \
+				CLEAR_MNT_SLAVE(mnt),\
+				CLEAR_MNT_SHARED(mnt),\
+				CLEAR_MNT_PRIVATE(mnt),\
 				mnt->mnt_pnode = NULL)
 #define SET_MNT_SLAVE(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_SLAVE), \
 				CLEAR_MNT_PRIVATE(mnt), \
-				CLEAR_MNT_SHARED(mnt))
+				CLEAR_MNT_SHARED(mnt), \
+				CLEAR_MNT_UNCLONABLE(mnt))
 
 struct vfsmount
 {

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 3/8] bind/rbind a shared/private/slave/unclone tree
       [not found]     ` <1120816436.30164.19.camel@localhost>
@ 2005-07-08 10:25       ` Ram
       [not found]       ` <1120816521.30164.22.camel@localhost>
  1 sibling, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 107 bytes --]

Adds the ability to bind/rbind a shared/private/slave subtree and set up
propogation wherever needed.

RP


[-- Attachment #2: rbind.patch --]
[-- Type: text/x-patch, Size: 32037 bytes --]

Adds the ability to bind/rbind a shared/private/slave subtree and set up
propogation wherever needed.

RP

Signed by Ram Pai (linuxram@us.ibm.com)

 fs/namespace.c        |  510 ++++++++++++++++++++++++++++++++++++++++----------
 fs/pnode.c            |  244 +++++++++++++++++++++++
 include/linux/fs.h    |    4 
 include/linux/pnode.h |    5 
 4 files changed, 664 insertions(+), 99 deletions(-)

Index: 2.6.12/fs/namespace.c
===================================================================
--- 2.6.12.orig/fs/namespace.c
+++ 2.6.12/fs/namespace.c
@@ -42,7 +42,8 @@ static inline int sysfs_init(void)
 
 static struct list_head *mount_hashtable;
 static int hash_mask, hash_bits;
-static kmem_cache_t *mnt_cache; 
+static kmem_cache_t *mnt_cache;
+static struct rw_semaphore namespace_sem;
 
 static inline unsigned long hash(struct vfsmount *mnt, struct dentry *dentry)
 {
@@ -54,7 +55,7 @@ static inline unsigned long hash(struct 
 
 struct vfsmount *alloc_vfsmnt(const char *name)
 {
-	struct vfsmount *mnt = kmem_cache_alloc(mnt_cache, GFP_KERNEL); 
+	struct vfsmount *mnt = kmem_cache_alloc(mnt_cache, GFP_KERNEL);
 	if (mnt) {
 		memset(mnt, 0, sizeof(struct vfsmount));
 		atomic_set(&mnt->mnt_count,1);
@@ -109,6 +110,32 @@ struct vfsmount *lookup_mnt(struct vfsmo
 	return found;
 }
 
+static struct vfsmount *
+clone_mnt(struct vfsmount *old, struct dentry *root)
+{
+	struct super_block *sb = old->mnt_sb;
+	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
+
+	if (mnt) {
+		mnt->mnt_flags = old->mnt_flags;
+		atomic_inc(&sb->s_active);
+		mnt->mnt_sb = sb;
+		mnt->mnt_root = dget(root);
+		mnt->mnt_mountpoint = mnt->mnt_root;
+		mnt->mnt_parent = mnt;
+		mnt->mnt_namespace = old->mnt_namespace;
+		mnt->mnt_pnode = get_pnode(old->mnt_pnode);
+
+		/* stick the duplicate mount on the same expiry list
+		 * as the original if that was on one */
+		spin_lock(&vfsmount_lock);
+		if (!list_empty(&old->mnt_fslink))
+			list_add(&mnt->mnt_fslink, &old->mnt_fslink);
+		spin_unlock(&vfsmount_lock);
+	}
+	return mnt;
+}
+
 static inline int check_mnt(struct vfsmount *mnt)
 {
 	return mnt->mnt_namespace == current->namespace;
@@ -122,7 +149,6 @@ static void detach_mnt(struct vfsmount *
 	mnt->mnt_mountpoint = mnt->mnt_root;
 	list_del_init(&mnt->mnt_child);
 	list_del_init(&mnt->mnt_hash);
-	list_del_init(&mnt->mnt_pnode_mntlist);
 	old_nd->dentry->d_mounted--;
 }
 
@@ -130,11 +156,68 @@ static void attach_mnt(struct vfsmount *
 {
 	mnt->mnt_parent = mntget(nd->mnt);
 	mnt->mnt_mountpoint = dget(nd->dentry);
+	mnt->mnt_namespace = nd->mnt->mnt_namespace;
 	list_add(&mnt->mnt_hash, mount_hashtable+hash(nd->mnt, nd->dentry));
 	list_add_tail(&mnt->mnt_child, &nd->mnt->mnt_mounts);
 	nd->dentry->d_mounted++;
 }
 
+static struct vfsmount *do_attach_mnt(struct vfsmount *mnt,
+		struct dentry *dentry,
+		struct vfsmount *child_mnt)
+{
+	struct nameidata nd;
+	LIST_HEAD(head);
+
+	nd.mnt = mnt;
+	nd.dentry = dentry;
+	attach_mnt(child_mnt, &nd);
+	list_add_tail(&head, &child_mnt->mnt_list);
+	list_splice(&head, child_mnt->mnt_namespace->list.prev);
+	return child_mnt;
+}
+
+static void attach_prepare_mnt(struct vfsmount *mnt, struct nameidata *nd)
+{
+	mnt->mnt_parent = mntget(nd->mnt);
+	mnt->mnt_mountpoint = dget(nd->dentry);
+	nd->dentry->d_mounted++;
+}
+
+void do_attach_real_mnt(struct vfsmount *mnt)
+{
+	struct vfsmount *parent = mnt->mnt_parent;
+	BUG_ON(parent==mnt);
+	if(list_empty(&mnt->mnt_hash))
+		list_add(&mnt->mnt_hash,
+			mount_hashtable+hash(parent, mnt->mnt_mountpoint));
+	if(list_empty(&mnt->mnt_child))
+		list_add_tail(&mnt->mnt_child, &parent->mnt_mounts);
+	mnt->mnt_namespace = parent->mnt_namespace;
+	list_add_tail(&mnt->mnt_list, &mnt->mnt_namespace->list);
+}
+
+struct vfsmount *do_attach_prepare_mnt(struct vfsmount *mnt,
+		struct dentry *dentry,
+		struct vfsmount *template_mnt,
+		int clone_flag)
+{
+	struct vfsmount *child_mnt;
+	struct nameidata nd;
+
+	if (clone_flag)
+		child_mnt = clone_mnt(template_mnt, template_mnt->mnt_root);
+	else
+		child_mnt = template_mnt;
+
+	nd.mnt = mnt;
+	nd.dentry = dentry;
+
+	attach_prepare_mnt(child_mnt, &nd);
+
+	return child_mnt;
+}
+
 static struct vfsmount *next_mnt(struct vfsmount *p, struct vfsmount *root)
 {
 	struct list_head *next = p->mnt_mounts.next;
@@ -151,37 +234,20 @@ static struct vfsmount *next_mnt(struct 
 	return list_entry(next, struct vfsmount, mnt_child);
 }
 
-static struct vfsmount *
-clone_mnt(struct vfsmount *old, struct dentry *root)
+static struct vfsmount *skip_mnt_tree(struct vfsmount *p)
 {
-	struct super_block *sb = old->mnt_sb;
-	struct vfsmount *mnt = alloc_vfsmnt(old->mnt_devname);
-
-	if (mnt) {
-		mnt->mnt_flags = old->mnt_flags;
-		atomic_inc(&sb->s_active);
-		mnt->mnt_sb = sb;
-		mnt->mnt_root = dget(root);
-		mnt->mnt_mountpoint = mnt->mnt_root;
-		mnt->mnt_parent = mnt;
-		mnt->mnt_namespace = old->mnt_namespace;
-		mnt->mnt_pnode = get_pnode(old->mnt_pnode);
-
-		/* stick the duplicate mount on the same expiry list
-		 * as the original if that was on one */
-		spin_lock(&vfsmount_lock);
-		if (!list_empty(&old->mnt_fslink))
-			list_add(&mnt->mnt_fslink, &old->mnt_fslink);
-		spin_unlock(&vfsmount_lock);
+	struct list_head *prev = p->mnt_mounts.prev;
+	while (prev != &p->mnt_mounts) {
+		p = list_entry(prev, struct vfsmount, mnt_child);
+		prev = p->mnt_mounts.prev;
 	}
-	return mnt;
+	return p;
 }
 
 void __mntput(struct vfsmount *mnt)
 {
 	struct super_block *sb = mnt->mnt_sb;
 	dput(mnt->mnt_root);
-	put_pnode(mnt->mnt_pnode);
 	free_vfsmnt(mnt);
 	deactivate_super(sb);
 }
@@ -195,7 +261,7 @@ static void *m_start(struct seq_file *m,
 	struct list_head *p;
 	loff_t l = *pos;
 
-	down_read(&n->sem);
+	down_read(&namespace_sem);
 	list_for_each(p, &n->list)
 		if (!l--)
 			return list_entry(p, struct vfsmount, mnt_list);
@@ -212,8 +278,7 @@ static void *m_next(struct seq_file *m, 
 
 static void m_stop(struct seq_file *m, void *v)
 {
-	struct namespace *n = m->private;
-	up_read(&n->sem);
+	up_read(&namespace_sem);
 }
 
 static inline void mangle(struct seq_file *m, const char *s)
@@ -437,7 +502,7 @@ static int do_umount(struct vfsmount *mn
 		return retval;
 	}
 
-	down_write(&current->namespace->sem);
+	down_write(&namespace_sem);
 	spin_lock(&vfsmount_lock);
 
 	if (atomic_read(&sb->s_active) == 1) {
@@ -459,7 +524,7 @@ static int do_umount(struct vfsmount *mn
 	spin_unlock(&vfsmount_lock);
 	if (retval)
 		security_sb_umount_busy(mnt);
-	up_write(&current->namespace->sem);
+	up_write(&namespace_sem);
 	return retval;
 }
 
@@ -499,9 +564,9 @@ out:
 #ifdef __ARCH_WANT_SYS_OLDUMOUNT
 
 /*
- *	The 2.0 compatible umount. No flags. 
+ *	The 2.0 compatible umount. No flags.
  */
- 
+
 asmlinkage long sys_oldumount(char __user * name)
 {
 	return sys_umount(name,0);
@@ -545,6 +610,9 @@ static struct vfsmount *copy_tree(struct
 	struct list_head *h;
 	struct nameidata nd;
 
+	if (IS_MNT_UNCLONABLE(mnt))
+		return NULL;
+
 	res = q = clone_mnt(mnt, dentry);
 	if (!q)
 		goto Enomem;
@@ -553,10 +621,15 @@ static struct vfsmount *copy_tree(struct
 	p = mnt;
 	for (h = mnt->mnt_mounts.next; h != &mnt->mnt_mounts; h = h->next) {
 		r = list_entry(h, struct vfsmount, mnt_child);
+
 		if (!lives_below_in_same_fs(r->mnt_mountpoint, dentry))
 			continue;
 
 		for (s = r; s; s = next_mnt(s, r)) {
+			if (IS_MNT_UNCLONABLE(s)) {
+				s = skip_mnt_tree(s);
+				continue;
+			}
 			while (p != s->mnt_parent) {
 				p = p->mnt_parent;
 				q = q->mnt_parent;
@@ -583,9 +656,163 @@ static struct vfsmount *copy_tree(struct
 	return NULL;
 }
 
+ /*
+ *  @source_mnt : mount tree to be attached
+ *  @nd		: place the mount tree @source_mnt is attached
+ *
+ *  NOTE: in the table below explains the semantics when a source vfsmount
+ *  of a given type is attached to a destination vfsmount of a give type.
+ *  -----------------------------------------------------------------------------
+ *  |				BIND MOUNT OPERATION				|
+ *  |***************************************************************************|
+ *  |  dest --> | shared	 |	private	     |   slave	    |unclonable	|
+ *  | source	|		 |       	     |   	    |    	|
+ *  |   |   	|		 |       	     |   	    |    	|
+ *  |   v 	|		 |       	     |   	    |    	|
+ *  |***************************************************************************|
+ *  |	     	|		 |       	     |   	    |    	|
+ *  |  shared	| shared (++) 	 |      shared (+)   | shared (+)   | shared (+)|
+ *  |		|		 |       	     |   	    |    	|
+ *  |		|		 |       	     |   	    |    	|
+ *  | private	| shared (+)	 |      private	     | private 	    | private  	|
+ *  |		|		 |       	     |   	    |    	|
+ *  |		|		 |       	     |   	    |    	|
+ *  | slave	| shared (+)	 |      private      | private 	    | private  	|
+ *  |		|		 |       	     |   	    |    	|
+ *  |		|		 |       	     |   	    |    	|
+ *  | unclonable|    -		 |       -	     |   -	    |   - 	|
+ *  |		|		 |       	     |   	    |    	|
+ *  |		|		 |       	     |   	    |    	|
+ *   ***************************************************************************
+ *
+ * (++)  the mount will be propogated to all the vfsmounts in the pnode tree
+ *    	  of the destination vfsmount, and all the non-slave new mounts in
+ *    	  destination vfsmount will be added the source vfsmount's pnode.
+ * (+)  the mount will be propogated to the destination vfsmount
+ *    	  and the new mount will be added to the source vfsmount's pnode.
+ *
+ * if the source mount is a tree, the operations explained above is applied to each
+ * vfsmount in the tree.
+ *
+ * Should be called without spinlocks held, because this function can sleep
+ * in allocations.
+ *
+  */
+static void
+attach_recursive_mnt(struct vfsmount *source_mnt, struct nameidata *nd)
+{
+	struct vfsmount *mntpt_mnt, *m, *p;
+	struct vfspnode *src_pnode, *t_p, *dest_pnode, *tmp_pnode;
+	struct dentry *mntpt_dentry;
+	LIST_HEAD(pnodehead);
+
+	mntpt_mnt = nd->mnt;
+	dest_pnode = IS_MNT_SHARED(mntpt_mnt) ? mntpt_mnt->mnt_pnode : NULL;
+	src_pnode = IS_MNT_SHARED(source_mnt) ? source_mnt->mnt_pnode : NULL;
+
+	if (!dest_pnode && !src_pnode) {
+		LIST_HEAD(head);
+		spin_lock(&vfsmount_lock);
+		do_attach_mnt(nd->mnt, nd->dentry, source_mnt);
+		spin_unlock(&vfsmount_lock);
+		goto out;
+	}
+
+	/*
+	 * Ok, the source or the destination pnode exists.
+	 * Get ready for pnode operations.
+	 * Create a temporary pnode which shall hold all the
+	 * new mounts. Merge or delete or slave that pnode
+	 * later in a separate operation, depending on
+	 * the type of source and destination mounts.
+	 */
+	p = NULL;
+	for (m = source_mnt; m; m = next_mnt(m, source_mnt)) {
+		int unclone = IS_MNT_UNCLONABLE(m);
+
+		list_del_init(&m->mnt_list);
+
+		while (p && p != m->mnt_parent) {
+			p = p->mnt_parent;
+		}
+
+		if (!p) {
+			mntpt_dentry = nd->dentry;
+			mntpt_mnt = nd->mnt;
+		} else {
+			mntpt_dentry = m->mnt_mountpoint;
+			mntpt_mnt    = p;
+		}
+
+		p=m;
+		dest_pnode = IS_MNT_SHARED(mntpt_mnt) ?
+			mntpt_mnt->mnt_pnode : NULL;
+		src_pnode = (IS_MNT_SHARED(m))?
+				m->mnt_pnode : NULL;
+
+		m->mnt_pnode = NULL;
+		/*
+		 * get a temporary pnode into which add the new vfs, and keep
+		 * track of these pnodes and their real pnode.
+		 */
+		while (!(tmp_pnode = pnode_alloc()))
+			schedule();
+
+
+		if (dest_pnode && !unclone)
+			pnode_prepare_mount(dest_pnode, tmp_pnode,
+					mntpt_dentry, m, mntpt_mnt);
+		else {
+			if (m == m->mnt_parent)
+				do_attach_prepare_mnt(mntpt_mnt,
+					mntpt_dentry, m, 0);
+			pnode_add_member_mnt(tmp_pnode, m);
+			if (unclone)
+				SET_MNT_UNCLONABLE(m);
+		}
+
+		if ((!src_pnode && !dest_pnode) || unclone)
+			SET_PNODE_DELETE(tmp_pnode);
+		tmp_pnode->pnode_master = src_pnode;
+		/*
+		 * temporarily track the pnode with which the tmp_pnode
+		 * has to merge with; in the pnode_master field.
+		 */
+		list_add_tail(&tmp_pnode->pnode_peer_slave, &pnodehead);
+	}
+
+	/*
+	 * new mounts. Merge or delete or slave the temporary pnode
+	 */
+	spin_lock(&vfsmount_lock);
+	list_for_each_entry_safe(tmp_pnode, t_p, &pnodehead, pnode_peer_slave) {
+		int del_flag = IS_PNODE_DELETE(tmp_pnode);
+		struct vfspnode *master_pnode = tmp_pnode->pnode_master;
+		list_del_init(&tmp_pnode->pnode_peer_slave);
+		pnode_real_mount(tmp_pnode, del_flag);
+		if (!del_flag && master_pnode) {
+			tmp_pnode->pnode_master = NULL;
+			pnode_merge_pnode(tmp_pnode, master_pnode);
+			/*
+			 * we don't need the extra reference to
+			 * the master_pnode, that was created either
+			 * (a) pnode_add_slave_pnode: when the mnt was made as
+			 * 		a slave mnt.
+			 * (b) pnode_merge_pnode: during clone_mnt().
+			 */
+			put_pnode(master_pnode);
+		}
+	}
+	spin_unlock(&vfsmount_lock);
+out:
+	mntget(source_mnt);
+	return;
+}
+
 static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
 {
-	int err;
+	int err, ret;
+
 	if (mnt->mnt_sb->s_flags & MS_NOUSER)
 		return -EINVAL;
 
@@ -603,17 +830,14 @@ static int graft_tree(struct vfsmount *m
 		goto out_unlock;
 
 	err = -ENOENT;
-	spin_lock(&vfsmount_lock);
-	if (IS_ROOT(nd->dentry) || !d_unhashed(nd->dentry)) {
-		struct list_head head;
 
-		attach_mnt(mnt, nd);
-		list_add_tail(&head, &mnt->mnt_list);
-		list_splice(&head, current->namespace->list.prev);
-		mntget(mnt);
+	spin_lock(&vfsmount_lock);
+	ret = (IS_ROOT(nd->dentry) || !d_unhashed(nd->dentry));
+	spin_unlock(&vfsmount_lock);
+	if (ret) {
+		attach_recursive_mnt(mnt, nd);
 		err = 0;
 	}
-	spin_unlock(&vfsmount_lock);
 out_unlock:
 	up(&nd->dentry->d_inode->i_sem);
 	if (!err)
@@ -621,6 +845,11 @@ out_unlock:
 	return err;
 }
 
+ /*
+ * create a new mount at the dentry, and move all child mounts
+ * underneath it, if those mounts were below the dentry on the
+ * parent mount.
+  */
 struct vfsmount *do_make_mounted(struct vfsmount *mnt, struct dentry *dentry)
 {
 	struct vfsmount *child_mnt, *next;
@@ -628,6 +857,14 @@ struct vfsmount *do_make_mounted(struct 
 	struct vfsmount *newmnt = clone_mnt(mnt, dentry);
 	LIST_HEAD(move);
 
+	/*
+	 * note clone_mnt() gets a reference to the pnode.
+	 * we won't use that pnode anyway. So just let it
+	 * go
+	 */
+	put_pnode(newmnt->mnt_pnode);
+	newmnt->mnt_pnode = NULL;
+
 	if (newmnt) {
 		/* put in code for mount expiry */
 		/* TOBEDONE */
@@ -653,14 +890,41 @@ struct vfsmount *do_make_mounted(struct 
 
 		nd.mnt = mnt;
 		nd.dentry = dentry;
-		attach_mnt(newmnt, &nd);
+		do_attach_mnt(nd.mnt, nd.dentry, newmnt);
  		spin_unlock(&vfsmount_lock);
  	}
 	return newmnt;
 }
 
-int
-_do_make_mounted(struct nameidata *nd, struct vfsmount **mnt)
+int do_make_unmounted(struct vfsmount *mnt)
+{
+	struct vfsmount *parent_mnt, *child_mnt, *next;
+	struct nameidata nd;
+
+	/* validate if mount has a different parent */
+	parent_mnt = mnt->mnt_parent;
+	if (mnt == parent_mnt)
+		return 0;
+	/*
+	 * cannot unmount a mount that is not created
+	 * as a overlay mount.
+	 */
+	if (mnt->mnt_mountpoint != mnt->mnt_root)
+		return -EINVAL;
+
+	/* for each submounts in the parent, put the mounts back */
+	spin_lock(&vfsmount_lock);
+	list_for_each_entry_safe(child_mnt, next, &mnt->mnt_mounts, mnt_child) {
+		detach_mnt(child_mnt, &nd);
+		nd.mnt = parent_mnt;
+		attach_mnt(child_mnt, &nd);
+ 	}
+	detach_mnt(mnt, &nd);
+ 	spin_unlock(&vfsmount_lock);
+	return 0;
+}
+
+int make_mounted(struct nameidata *nd, struct vfsmount **mnt)
 {
 	struct vfsmount *parent_mnt;
 	struct dentry *parent_dentry;
@@ -671,9 +935,10 @@ _do_make_mounted(struct nameidata *nd, s
 	parent_mnt = nd->mnt;
  	/*
 	 * check if dentry already has a vfsmount
-	 * if it does not, create a vfsmount there.
-	 * which means you need to propogate it
-	 * across all vfsmounts.
+	 * if it does not, create and attach
+	 * a new vfsmount at that dentry.
+	 * Also propogate the mount if parent_mnt
+	 * is shared.
  	 */
 	if(parent_dentry == parent_mnt->mnt_root) {
 		*mnt = parent_mnt;
@@ -688,6 +953,25 @@ _do_make_mounted(struct nameidata *nd, s
 	return err;
 }
 
+int make_unmounted(struct vfsmount *mnt)
+{
+	if (mnt == mnt->mnt_parent)
+		return 0;
+	/*
+	 * cannot unmount a mount that is not created
+	 * as a overlay mount.
+	 */
+	if (mnt->mnt_mountpoint != mnt->mnt_root)
+		return -EINVAL;
+
+	if (IS_MNT_SHARED(mnt))
+		pnode_make_unmounted(mnt->mnt_pnode);
+ 	else
+		do_make_unmounted(mnt);
+
+	return 0;
+}
+
 /*
  * recursively change the type of the mountpoint.
  */
@@ -702,7 +986,7 @@ static int do_change_type(struct nameida
 			&& !(flag & MS_SLAVE))
 		return -EINVAL;
 
-	if ((err = _do_make_mounted(nd, &mnt)))
+	if ((err = make_mounted(nd, &mnt)))
 		return err;
 
 	spin_lock(&vfsmount_lock);
@@ -735,9 +1019,8 @@ static int do_change_type(struct nameida
 			break;
 
 		case MS_SLAVE:
-			if (IS_MNT_SLAVE(m)) {
+			if (IS_MNT_SLAVE(m))
 				break;
-			}
 			/*
 			 * only shared mounts can
 			 * be made slave
@@ -775,7 +1058,7 @@ static int do_change_type(struct nameida
 static int do_loopback(struct nameidata *nd, char *old_name, int recurse)
 {
 	struct nameidata old_nd;
-	struct vfsmount *mnt = NULL;
+	struct vfsmount *mnt = NULL, *overlay_mnt=NULL;
 	int err = mount_is_safe(nd);
 	if (err)
 		return err;
@@ -785,14 +1068,31 @@ static int do_loopback(struct nameidata 
 	if (err)
 		return err;
 
-	down_write(&current->namespace->sem);
+	if (IS_MNT_UNCLONABLE(old_nd.mnt)) {
+		err = -EINVAL;
+		goto path_release;
+	}
+
+	down_write(&namespace_sem);
 	err = -EINVAL;
 	if (check_mnt(nd->mnt) && (!recurse || check_mnt(old_nd.mnt))) {
+
+		/*
+		 * If the dentry is not the root dentry, and if a bind
+		 * from a shared subtree is attempted, create a mount
+		 * at the dentry, and use the new mount as the starting
+		 * point for the bind/rbind operation.
+		 */
+		overlay_mnt = old_nd.mnt;
+		if(IS_MNT_SHARED(old_nd.mnt) &&
+			(err = make_mounted(&old_nd, &overlay_mnt)))
+			goto out;
+
 		err = -ENOMEM;
 		if (recurse)
-			mnt = copy_tree(old_nd.mnt, old_nd.dentry);
+			mnt = copy_tree(overlay_mnt, old_nd.dentry);
 		else
-			mnt = clone_mnt(old_nd.mnt, old_nd.dentry);
+			mnt = clone_mnt(overlay_mnt, old_nd.dentry);
 	}
 
 	if (mnt) {
@@ -803,15 +1103,25 @@ static int do_loopback(struct nameidata 
 
 		err = graft_tree(mnt, nd);
 		if (err) {
-			spin_lock(&vfsmount_lock);
-			umount_tree(mnt);
-			spin_unlock(&vfsmount_lock);
-		} else
-			mntput(mnt);
-	}
+ 			spin_lock(&vfsmount_lock);
+ 			umount_tree(mnt);
+ 			spin_unlock(&vfsmount_lock);
+			/*
+			 * ok we failed! so undo any overlay
+			 * mount that we did earlier.
+			 */
+			if (old_nd.mnt !=  overlay_mnt)
+				make_unmounted(overlay_mnt);
+ 		} else
+ 			mntput(mnt);
+ 	}
+
+ out:
+	up_write(&namespace_sem);
+
+ path_release:
+ 	path_release(&old_nd);
 
-	up_write(&current->namespace->sem);
-	path_release(&old_nd);
 	return err;
 }
 
@@ -859,7 +1169,7 @@ static int do_move_mount(struct nameidat
 	if (err)
 		return err;
 
-	down_write(&current->namespace->sem);
+	down_write(&namespace_sem);
 	while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
 		;
 	err = -EINVAL;
@@ -903,7 +1213,7 @@ out2:
 out1:
 	up(&nd->dentry->d_inode->i_sem);
 out:
-	up_write(&current->namespace->sem);
+	up_write(&namespace_sem);
 	if (!err)
 		path_release(&parent_nd);
 	path_release(&old_nd);
@@ -942,7 +1252,7 @@ int do_add_mount(struct vfsmount *newmnt
 {
 	int err;
 
-	down_write(&current->namespace->sem);
+	down_write(&namespace_sem);
 	/* Something was mounted here while we slept */
 	while(d_mountpoint(nd->dentry) && follow_down(&nd->mnt, &nd->dentry))
 		;
@@ -971,7 +1281,7 @@ int do_add_mount(struct vfsmount *newmnt
 	}
 
 unlock:
-	up_write(&current->namespace->sem);
+	up_write(&namespace_sem);
 	mntput(newmnt);
 	return err;
 }
@@ -1027,7 +1337,7 @@ void mark_mounts_for_expiry(struct list_
 		get_namespace(namespace);
 
 		spin_unlock(&vfsmount_lock);
-		down_write(&namespace->sem);
+		down_write(&namespace_sem);
 		spin_lock(&vfsmount_lock);
 
 		/* check that it is still dead: the count should now be 2 - as
@@ -1071,7 +1381,7 @@ void mark_mounts_for_expiry(struct list_
 			spin_unlock(&vfsmount_lock);
 		}
 
-		up_write(&namespace->sem);
+		up_write(&namespace_sem);
 
 		mntput(mnt);
 		put_namespace(namespace);
@@ -1117,7 +1427,7 @@ int copy_mount_options(const void __user
 	int i;
 	unsigned long page;
 	unsigned long size;
-	
+
 	*where = 0;
 	if (!data)
 		return 0;
@@ -1136,7 +1446,7 @@ int copy_mount_options(const void __user
 
 	i = size - exact_copy_from_user((void *)page, data, size);
 	if (!i) {
-		free_page(page); 
+		free_page(page);
 		return -EFAULT;
 	}
 	if (i != PAGE_SIZE)
@@ -1242,14 +1552,13 @@ int copy_namespace(int flags, struct tas
 		goto out;
 
 	atomic_set(&new_ns->count, 1);
-	init_rwsem(&new_ns->sem);
 	INIT_LIST_HEAD(&new_ns->list);
 
-	down_write(&tsk->namespace->sem);
+	down_write(&namespace_sem);
 	/* First pass: copy the tree topology */
 	new_ns->root = copy_tree(namespace->root, namespace->root->mnt_root);
 	if (!new_ns->root) {
-		up_write(&tsk->namespace->sem);
+		up_write(&namespace_sem);
 		kfree(new_ns);
 		goto out;
 	}
@@ -1283,7 +1592,7 @@ int copy_namespace(int flags, struct tas
 		p = next_mnt(p, namespace->root);
 		q = next_mnt(q, new_ns->root);
 	}
-	up_write(&tsk->namespace->sem);
+	up_write(&namespace_sem);
 
 	tsk->namespace = new_ns;
 
@@ -1465,7 +1774,7 @@ asmlinkage long sys_pivot_root(const cha
 	user_nd.mnt = mntget(current->fs->rootmnt);
 	user_nd.dentry = dget(current->fs->root);
 	read_unlock(&current->fs->lock);
-	down_write(&current->namespace->sem);
+	down_write(&namespace_sem);
 	down(&old_nd.dentry->d_inode->i_sem);
 	error = -EINVAL;
 	if (!check_mnt(user_nd.mnt))
@@ -1511,7 +1820,7 @@ asmlinkage long sys_pivot_root(const cha
 	path_release(&parent_nd);
 out2:
 	up(&old_nd.dentry->d_inode->i_sem);
-	up_write(&current->namespace->sem);
+	up_write(&namespace_sem);
 	path_release(&user_nd);
 	path_release(&old_nd);
 out1:
@@ -1538,7 +1847,6 @@ static void __init init_mount_tree(void)
 		panic("Can't allocate initial namespace");
 	atomic_set(&namespace->count, 1);
 	INIT_LIST_HEAD(&namespace->list);
-	init_rwsem(&namespace->sem);
 	list_add(&mnt->mnt_list, &namespace->list);
 	namespace->root = mnt;
 	mnt->mnt_namespace = namespace;
@@ -1561,6 +1869,8 @@ void __init mnt_init(unsigned long mempa
 	unsigned int nr_hash;
 	int i;
 
+	init_rwsem(&namespace_sem);
+
 	mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct vfsmount),
 			0, SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
 
@@ -1608,7 +1918,7 @@ void __put_namespace(struct namespace *n
 {
 	struct vfsmount *mnt;
 
-	down_write(&namespace->sem);
+	down_write(&namespace_sem);
 	spin_lock(&vfsmount_lock);
 
 	list_for_each_entry(mnt, &namespace->list, mnt_list) {
@@ -1617,6 +1927,6 @@ void __put_namespace(struct namespace *n
 
 	umount_tree(namespace->root);
 	spin_unlock(&vfsmount_lock);
-	up_write(&namespace->sem);
+	up_write(&namespace_sem);
 	kfree(namespace);
 }
Index: 2.6.12/fs/pnode.c
===================================================================
--- 2.6.12.orig/fs/pnode.c
+++ 2.6.12/fs/pnode.c
@@ -287,6 +287,52 @@ pnode_disassociate_mnt(struct vfsmount *
 	mnt->mnt_flags &= ~MNT_SHARED;
 }
 
+// merge pnode into peer_pnode and get rid of pnode
+int
+pnode_merge_pnode(struct vfspnode *pnode, struct vfspnode *peer_pnode)
+{
+	struct vfspnode *slave_pnode, *pnext;
+	struct vfsmount *mnt, *slave_mnt, *next;
+	int i,count;
+
+	spin_lock(&vfspnode_lock);
+	list_for_each_entry_safe(slave_pnode,  pnext,
+			&pnode->pnode_slavepnode, pnode_peer_slave) {
+		slave_pnode->pnode_master = peer_pnode;
+		list_move(&slave_pnode->pnode_peer_slave,
+				&peer_pnode->pnode_slavepnode);
+	}
+
+	list_for_each_entry_safe(slave_mnt,  next,
+			&pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+		slave_mnt->mnt_pnode = peer_pnode;
+		list_move(&slave_mnt->mnt_pnode_mntlist,
+				&peer_pnode->pnode_slavevfs);
+	}
+
+	list_for_each_entry_safe(mnt, next,
+			&pnode->pnode_vfs, mnt_pnode_mntlist) {
+		mnt->mnt_pnode = peer_pnode;
+		list_move(&mnt->mnt_pnode_mntlist,
+				&peer_pnode->pnode_vfs);
+	}
+
+	count = atomic_read(&pnode->pnode_count);
+	atomic_add(count, &peer_pnode->pnode_count);
+	spin_unlock(&vfspnode_lock);
+
+	/*
+	 * delete all references to 'pnode'.
+	 * A better implementation can simply
+	 * call free_pnode(pnode). But this is
+	 * a cleaner way of doing it. Offcourse
+	 * with some cost.
+	 */
+	for (i=0 ; i <count; i++)
+		put_pnode(pnode);
+	return 0;
+}
+
 struct vfsmount *
 pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
 {
@@ -360,3 +406,223 @@ out:
 	mntput(child_mnt);
 	return child_mnt;
 }
+
+int
+vfs_make_unmounted(struct vfsmount *mnt)
+{
+	struct vfspnode *pnode;
+	int ret=0;
+
+	if (do_make_unmounted(mnt)) {
+		ret = 1;
+		goto out;
+	}
+	pnode = mnt->mnt_pnode;
+	list_del_init(&mnt->mnt_pnode_mntlist);
+	put_pnode(pnode);
+out:
+	return ret;
+}
+
+int
+pnode_make_unmounted(struct vfspnode *pnode)
+{
+	int ret=0,traversal;
+     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
+	struct pcontext context;
+
+	context.start = pnode;
+	context.pnode = NULL;
+	while(pnode_next(&context)) {
+		traversal = context.traversal;
+		pnode = context.pnode;
+		if(traversal == PNODE_UP) {
+			// traverse member vfsmounts
+			spin_lock(&vfspnode_lock);
+			list_for_each_entry_safe(member_mnt,
+				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if ((ret = vfs_make_unmounted(member_mnt)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			list_for_each_entry_safe(slave_mnt,
+				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if ((ret = vfs_make_unmounted(slave_mnt)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			spin_unlock(&vfspnode_lock);
+		}
+	}
+
+out:
+	return ret;
+}
+
+
+int
+vfs_prepare_mount_func(struct vfsmount *mnt, int flag, struct vfspnode *pnode,
+		struct vfsmount *source_mnt,
+		struct dentry *mountpoint_dentry,
+		struct vfsmount *p_mnt)
+
+{
+	struct vfsmount *child_mnt;
+
+	if ((p_mnt != mnt) || (source_mnt == source_mnt->mnt_parent)) {
+		child_mnt = do_attach_prepare_mnt(mnt, mountpoint_dentry,
+				source_mnt, (p_mnt != mnt));
+		if (child_mnt != source_mnt)
+			put_pnode(source_mnt->mnt_pnode);
+	} else
+		child_mnt = source_mnt;
+
+	switch (flag) {
+	case PNODE_SLAVE_VFS :
+		pnode_add_slave_mnt(pnode, child_mnt);
+		break;
+	case PNODE_MEMBER_VFS :
+		pnode_add_member_mnt(pnode, child_mnt);
+		break;
+	}
+
+	return 0;
+}
+
+
+int
+pnode_prepare_mount(struct vfspnode *pnode,
+		struct vfspnode *master_child_pnode,
+		struct dentry *mountpoint_dentry,
+		struct vfsmount *source_mnt,
+		struct vfsmount *mnt)
+{
+	int ret=0,traversal,level;
+     	struct vfspnode *slave_pnode, *master_pnode, *slave_child_pnode, *child_pnode;
+     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
+	struct pcontext context;
+	static struct inoutdata p_array[PNODE_MAX_SLAVE_LEVEL];
+
+	context.start = pnode;
+	context.pnode = NULL;
+	while(pnode_next(&context)) {
+		traversal = context.traversal;
+		level = context.level;
+		pnode = context.pnode;
+		slave_pnode = context.slave_pnode;
+		master_pnode = context.master_pnode;
+
+		if (traversal == PNODE_DOWN ) {
+
+			child_pnode = NULL;
+			if (!master_pnode)
+				child_pnode = master_child_pnode;
+
+			while (!(child_pnode = pnode_alloc()))
+				schedule();
+
+			p_array[level].my_data = child_pnode;
+
+		} else if(traversal == PNODE_MID) {
+
+			child_pnode = (struct vfspnode *)p_array[level].my_data;
+			slave_child_pnode = p_array[level+1].out_data;
+			pnode_add_slave_pnode(child_pnode, slave_child_pnode);
+
+		} else if(traversal == PNODE_UP) {
+
+			child_pnode = p_array[level].my_data;
+			spin_lock(&vfspnode_lock);
+			list_for_each_entry_safe(member_mnt,
+				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if((ret=vfs_prepare_mount_func(member_mnt,
+					PNODE_MEMBER_VFS, child_pnode, source_mnt,
+					mountpoint_dentry, mnt)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			list_for_each_entry_safe(slave_mnt,
+				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if((ret = vfs_prepare_mount_func(slave_mnt,
+					PNODE_SLAVE_VFS, child_pnode, source_mnt,
+					mountpoint_dentry, mnt)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			spin_unlock(&vfspnode_lock);
+			p_array[level].out_data = child_pnode;
+
+		}
+	}
+
+out:
+	return ret;
+}
+
+
+int
+vfs_real_mount_func(struct vfsmount *mnt, int delflag)
+{
+	BUG_ON(mnt == mnt->mnt_parent);
+	do_attach_real_mnt(mnt);
+	if (delflag) {
+		list_del_init(&mnt->mnt_pnode_mntlist);
+		mnt->mnt_pnode = NULL;
+		SET_MNT_PRIVATE(mnt);
+	}
+	return 0;
+}
+
+/*
+ * @pnode: walk the propogation tree and complete the
+ * 	attachments of the child mounts to the parents
+ * 	correspondingly.
+ * @flag: if set destroy the propogation tree
+ */
+int
+pnode_real_mount(struct vfspnode *pnode, int flag)
+{
+	int ret=0,traversal;
+     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
+	struct pcontext context;
+
+	context.start = pnode;
+	context.pnode = NULL;
+	while(pnode_next(&context)) {
+		traversal = context.traversal;
+		pnode = context.pnode;
+		if(traversal == PNODE_MID) {
+			if (flag) {
+				BUG_ON(!list_empty(&pnode->pnode_vfs));
+				BUG_ON(!list_empty(&pnode->pnode_slavevfs));
+				BUG_ON(!list_empty(&pnode->pnode_slavepnode));
+				list_del_init(&pnode->pnode_peer_slave);
+				put_pnode(pnode);
+			}
+		} else if(traversal == PNODE_UP) {
+			// traverse member vfsmounts
+			spin_lock(&vfspnode_lock);
+			list_for_each_entry_safe(member_mnt,
+				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if ((ret = vfs_real_mount_func(member_mnt, flag)))
+					goto out;
+			}
+			list_for_each_entry_safe(slave_mnt,
+				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if ((ret = vfs_real_mount_func(slave_mnt, flag)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			spin_unlock(&vfspnode_lock);
+		}
+	}
+
+out:
+	return ret;
+}
Index: 2.6.12/include/linux/fs.h
===================================================================
--- 2.6.12.orig/include/linux/fs.h
+++ 2.6.12/include/linux/fs.h
@@ -1216,7 +1216,11 @@ extern struct vfsmount *kern_mount(struc
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 extern long do_mount(char *, char *, char *, unsigned long, void *);
+extern struct vfsmount *do_attach_prepare_mnt(struct vfsmount *,
+		struct dentry *, struct vfsmount *, int );
+extern void do_attach_real_mnt(struct vfsmount *);
 extern struct vfsmount *do_make_mounted(struct vfsmount *, struct dentry *);
+extern int do_make_unmounted(struct vfsmount *);
 
 extern int vfs_statfs(struct super_block *, struct kstatfs *);
 
Index: 2.6.12/include/linux/pnode.h
===================================================================
--- 2.6.12.orig/include/linux/pnode.h
+++ 2.6.12/include/linux/pnode.h
@@ -73,6 +73,11 @@ void pnode_del_slave_mnt(struct vfsmount
 void pnode_del_member_mnt(struct vfsmount *);
 void pnode_disassociate_mnt(struct vfsmount *);
 void pnode_add_slave_pnode(struct vfspnode *, struct vfspnode *);
+int pnode_merge_pnode(struct vfspnode *, struct vfspnode *);
 struct vfsmount * pnode_make_mounted(struct vfspnode *, struct vfsmount *, struct dentry *);
+int  pnode_make_unmounted(struct vfspnode *);
+int pnode_prepare_mount(struct vfspnode *, struct vfspnode *, struct dentry *,
+		struct vfsmount *, struct vfsmount *);
+int pnode_real_mount(struct vfspnode *, int);
 #endif /* KERNEL */
 #endif /* _LINUX_PNODE_H */
Index: 2.6.12/include/linux/namespace.h
===================================================================
--- 2.6.12.orig/include/linux/namespace.h
+++ 2.6.12/include/linux/namespace.h
@@ -9,7 +9,6 @@ struct namespace {
 	atomic_t		count;
 	struct vfsmount *	root;
 	struct list_head	list;
-	struct rw_semaphore	sem;
 };
 
 extern void umount_tree(struct vfsmount *);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 4/8] move a shared/private/slave/unclone tree
       [not found]       ` <1120816521.30164.22.camel@localhost>
@ 2005-07-08 10:25         ` Ram
       [not found]         ` <1120816600.30164.25.camel@localhost>
  1 sibling, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 164 bytes --]


Adds ability to move a shared/private/slave/unclone tree to any other
shared/private/slave/unclone tree. Also incorporates the same behavior
for pivot_root()

RP


[-- Attachment #2: move.patch --]
[-- Type: text/x-patch, Size: 8297 bytes --]

Adds ability to move a shared/private/slave/unclone tree to any other
shared/private/slave/unclone tree. Also incorporates the same behavior
for pivot_root()

RP


Signed by Ram Pai (linuxram@us.ibm.com)

 namespace.c |  106 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 94 insertions(+), 12 deletions(-)

Index: 2.6.12/fs/namespace.c
===================================================================
--- 2.6.12.orig/fs/namespace.c
+++ 2.6.12/fs/namespace.c
@@ -656,9 +656,12 @@ static struct vfsmount *copy_tree(struct
 	return NULL;
 }
 
+
  /*
  *  @source_mnt : mount tree to be attached
  *  @nd		: place the mount tree @source_mnt is attached
+ *  @move	: use the move semantics if set, else use normal attach semantics
+ *                as explained below
  *
  *  NOTE: in the table below explains the semantics when a source vfsmount
  *  of a given type is attached to a destination vfsmount of a give type.
@@ -691,6 +694,34 @@ static struct vfsmount *copy_tree(struct
  * (+)  the mount will be propogated to the destination vfsmount
  *    	  and the new mount will be added to the source vfsmount's pnode.
  *
+ *  -----------------------------------------------------------------------------
+ *  |				MOVE MOUNT SEMANTICS				|
+ *  |***************************************************************************|
+ *  |  dest --> | shared	 |	private	     |   slave	    |unclonable	|
+ *  | source   	|		 |       	     |   	    |    	|
+ *  |   |   	|		 |       	     |   	    |    	|
+ *  |   v 	|		 |       	     |   	    |    	|
+ *  |***************************************************************************|
+ *  |	     	|		 |       	     |   	    |    	|
+ *  | shared 	| shared (++) 	 |      shared (+)   | shared (+)   | shared (+)|
+ *  |		|		 |       	     |   	    |    	|
+ *  |		|		 |       	     |   	    |    	|
+ *  | private	| shared (+)	 |      private	     | private 	    | private  	|
+ *  |		|		 |       	     |   	    |    	|
+ *  |		|		 |       	     |   	    |    	|
+ *  | slave	| shared (+++)	 |      slave        | slave   	    | slave    	|
+ *  |		|		 |       	     |   	    |    	|
+ *  |		|		 |       	     |   	    |    	|
+ *  | unclonable| unclonable	 |     unclonable    | unclonable   | unclonable|
+ *  |		|		 |       	     |   	    |    	|
+ *  |		|		 |       	     |   	    |    	|
+ *  |***************************************************************************|
+ *
+ * (+++)  the mount will be propogated to all the vfsmounts in the pnode tree
+ *    	  of the destination vfsmount, and all the new mounts will be
+ *    	  added to a new pnode , which will be a slave pnode of the
+ *    	  source vfsmount's pnode.
+ *
  * if the source mount is a tree, the operations explained above is applied to each
  * vfsmount in the tree.
  *
@@ -699,7 +730,7 @@ static struct vfsmount *copy_tree(struct
  *
   */
 static void
-attach_recursive_mnt(struct vfsmount *source_mnt, struct nameidata *nd)
+attach_recursive_mnt(struct vfsmount *source_mnt, struct nameidata *nd, int move)
 {
 	struct vfsmount *mntpt_mnt, *m, *p;
 	struct vfspnode *src_pnode, *t_p, *dest_pnode, *tmp_pnode;
@@ -729,6 +760,7 @@ attach_recursive_mnt(struct vfsmount *so
 	p = NULL;
 	for (m = source_mnt; m; m = next_mnt(m, source_mnt)) {
 		int unclone = IS_MNT_UNCLONABLE(m);
+		int slave = IS_MNT_SLAVE(m);
 
 		list_del_init(&m->mnt_list);
 
@@ -747,7 +779,7 @@ attach_recursive_mnt(struct vfsmount *so
 		p=m;
 		dest_pnode = IS_MNT_SHARED(mntpt_mnt) ?
 			mntpt_mnt->mnt_pnode : NULL;
-		src_pnode = (IS_MNT_SHARED(m))?
+		src_pnode = (IS_MNT_SHARED(m) || (move && dest_pnode && slave))?
 				m->mnt_pnode : NULL;
 
 		m->mnt_pnode = NULL;
@@ -766,14 +798,20 @@ attach_recursive_mnt(struct vfsmount *so
 			if (m == m->mnt_parent)
 				do_attach_prepare_mnt(mntpt_mnt,
 					mntpt_dentry, m, 0);
-			pnode_add_member_mnt(tmp_pnode, m);
-			if (unclone)
-				SET_MNT_UNCLONABLE(m);
+			if (move && slave)
+				pnode_add_slave_mnt(tmp_pnode, m);
+			else {
+				pnode_add_member_mnt(tmp_pnode, m);
+				if (unclone)
+					SET_MNT_UNCLONABLE(m);
+			}
 		}
 
 		if ((!src_pnode && !dest_pnode) || unclone)
 			SET_PNODE_DELETE(tmp_pnode);
 		tmp_pnode->pnode_master = src_pnode;
+		if (move && dest_pnode && slave)
+			SET_PNODE_SLAVE(tmp_pnode);
 		/*
 		 * temporarily track the pnode with which the tmp_pnode
 		 * has to merge with; in the pnode_master field.
@@ -787,12 +825,16 @@ attach_recursive_mnt(struct vfsmount *so
 	spin_lock(&vfsmount_lock);
 	list_for_each_entry_safe(tmp_pnode, t_p, &pnodehead, pnode_peer_slave) {
 		int del_flag = IS_PNODE_DELETE(tmp_pnode);
+		int slave_flag = IS_PNODE_SLAVE(tmp_pnode);
 		struct vfspnode *master_pnode = tmp_pnode->pnode_master;
 		list_del_init(&tmp_pnode->pnode_peer_slave);
 		pnode_real_mount(tmp_pnode, del_flag);
 		if (!del_flag && master_pnode) {
 			tmp_pnode->pnode_master = NULL;
-			pnode_merge_pnode(tmp_pnode, master_pnode);
+			if (slave_flag)
+				pnode_add_slave_pnode(master_pnode, tmp_pnode);
+			else
+				pnode_merge_pnode(tmp_pnode, master_pnode);
 			/*
 			 * we don't need the extra reference to
 			 * the master_pnode, that was created either
@@ -809,6 +851,18 @@ out:
 	return;
 }
 
+static void
+detach_recursive_mnt(struct vfsmount *source_mnt, struct nameidata *nd)
+{
+	struct vfsmount *m;
+
+	detach_mnt(source_mnt, nd);
+	for (m = source_mnt; m; m = next_mnt(m, source_mnt)) {
+		list_del_init(&m->mnt_pnode_mntlist);
+	}
+	return;
+}
+
 static int graft_tree(struct vfsmount *mnt, struct nameidata *nd)
 {
 	int err, ret;
@@ -835,7 +889,7 @@ static int graft_tree(struct vfsmount *m
 	ret = (IS_ROOT(nd->dentry) || !d_unhashed(nd->dentry));
 	spin_unlock(&vfsmount_lock);
 	if (ret) {
-		attach_recursive_mnt(mnt, nd);
+		attach_recursive_mnt(mnt, nd, 0);
 		err = 0;
 	}
 out_unlock:
@@ -1195,6 +1249,12 @@ static int do_move_mount(struct nameidat
 	if (S_ISDIR(nd->dentry->d_inode->i_mode) !=
 	      S_ISDIR(old_nd.dentry->d_inode->i_mode))
 		goto out2;
+	/*
+	 * Don't move a mount in a shared parent.
+	 */
+	if(old_nd.mnt->mnt_parent &&
+		IS_MNT_SHARED(old_nd.mnt->mnt_parent))
+		goto out2;
 
 	err = -ELOOP;
 	for (p = nd->mnt; p->mnt_parent!=p; p = p->mnt_parent)
@@ -1202,8 +1262,14 @@ static int do_move_mount(struct nameidat
 			goto out2;
 	err = 0;
 
-	detach_mnt(old_nd.mnt, &parent_nd);
-	attach_mnt(old_nd.mnt, nd);
+	detach_recursive_mnt(old_nd.mnt, &parent_nd);
+	if(IS_MNT_SHARED(nd->mnt)) {
+		spin_unlock(&vfsmount_lock);
+		attach_recursive_mnt(old_nd.mnt, nd, 1);
+		spin_lock(&vfsmount_lock);
+		mntput(old_nd.mnt);
+	} else
+		attach_mnt(old_nd.mnt, nd);
 
 	/* if the mount is moved, it should no longer be expire
 	 * automatically */
@@ -1794,6 +1860,16 @@ asmlinkage long sys_pivot_root(const cha
 		goto out2; /* not a mountpoint */
 	if (new_nd.mnt->mnt_root != new_nd.dentry)
 		goto out2; /* not a mountpoint */
+	/*
+	 * Don't move a mount in a shared parent.
+	 */
+	if(user_nd.mnt->mnt_parent &&
+		IS_MNT_SHARED(user_nd.mnt->mnt_parent))
+		goto out2;
+	if(new_nd.mnt->mnt_parent &&
+		IS_MNT_SHARED(new_nd.mnt->mnt_parent))
+		goto out2;
+
 	tmp = old_nd.mnt; /* make sure we can reach put_old from new_root */
 	spin_lock(&vfsmount_lock);
 	if (tmp != new_nd.mnt) {
@@ -1808,10 +1884,22 @@ asmlinkage long sys_pivot_root(const cha
 			goto out3;
 	} else if (!is_subdir(old_nd.dentry, new_nd.dentry))
 		goto out3;
-	detach_mnt(new_nd.mnt, &parent_nd);
-	detach_mnt(user_nd.mnt, &root_parent);
-	attach_mnt(user_nd.mnt, &old_nd);     /* mount old root on put_old */
-	attach_mnt(new_nd.mnt, &root_parent); /* mount new_root on / */
+	detach_recursive_mnt(new_nd.mnt, &parent_nd);
+	detach_recursive_mnt(user_nd.mnt, &root_parent);
+ 	if(IS_MNT_SHARED(old_nd.mnt)) {
+		spin_unlock(&vfsmount_lock);
+ 		attach_recursive_mnt(user_nd.mnt, &old_nd, 1);
+		spin_lock(&vfsmount_lock);
+ 		mntput(user_nd.mnt);
+	} else
+ 		attach_mnt(user_nd.mnt, &old_nd);
+ 	if(IS_MNT_SHARED(root_parent.mnt)) {
+		spin_unlock(&vfsmount_lock);
+ 		attach_recursive_mnt(new_nd.mnt, &root_parent, 1);
+		spin_lock(&vfsmount_lock);
+ 		mntput(new_nd.mnt);
+	} else
+ 		attach_mnt(new_nd.mnt, &root_parent);
 	spin_unlock(&vfsmount_lock);
 	chroot_fs_refs(&user_nd, &new_nd);
 	security_sb_post_pivotroot(&user_nd, &new_nd);

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 5/8] umount a shared/private/slave/unclone tree
       [not found]         ` <1120816600.30164.25.camel@localhost>
@ 2005-07-08 10:25           ` Ram
       [not found]           ` <1120816720.30164.28.camel@localhost>
  1 sibling, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 10:25 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 66 bytes --]


Adds ability to unmount a shared/slave/unclone/private tree

RP


[-- Attachment #2: umount.patch --]
[-- Type: text/x-patch, Size: 8880 bytes --]

Adds ability to unmount a shared/slave/unclone/private tree

RP

Signed by Ram Pai (linuxram@us.ibm.com)

 fs/namespace.c        |   62 +++++++++++++++++++++++------
 fs/pnode.c            |  104 ++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h    |    2 
 include/linux/pnode.h |    2 
 4 files changed, 157 insertions(+), 13 deletions(-)

Index: 2.6.12/fs/namespace.c
===================================================================
--- 2.6.12.orig/fs/namespace.c
+++ 2.6.12/fs/namespace.c
@@ -344,6 +344,7 @@ struct seq_operations mounts_op = {
  * open files, pwds, chroots or sub mounts that are
  * busy.
  */
+//TOBEFIXED
 int may_umount_tree(struct vfsmount *mnt)
 {
 	struct list_head *next;
@@ -386,6 +387,20 @@ resume:
 
 EXPORT_SYMBOL(may_umount_tree);
 
+int mount_busy(struct vfsmount *mnt)
+{
+	struct vfspnode *parent_pnode;
+
+	if (mnt == mnt->mnt_parent || !IS_MNT_SHARED(mnt->mnt_parent))
+		return do_refcount_check(mnt, 2);
+
+	parent_pnode = mnt->mnt_parent->mnt_pnode;
+	BUG_ON(!parent_pnode);
+	return pnode_mount_busy(parent_pnode,
+			mnt->mnt_mountpoint,
+			mnt->mnt_root, mnt);
+}
+
 /**
  * may_umount - check if a mount point is busy
  * @mnt: root of mount
@@ -401,13 +416,25 @@ EXPORT_SYMBOL(may_umount_tree);
  */
 int may_umount(struct vfsmount *mnt)
 {
-	if (atomic_read(&mnt->mnt_count) > 2)
-		return -EBUSY;
-	return 0;
+	return (!mount_busy(mnt));
 }
 
 EXPORT_SYMBOL(may_umount);
 
+void do_detach_mount(struct vfsmount *mnt)
+{
+	struct nameidata old_nd;
+	if (mnt != mnt->mnt_parent) {
+		detach_mnt(mnt, &old_nd);
+		path_release(&old_nd);
+	}
+	list_del_init(&mnt->mnt_list);
+	list_del_init(&mnt->mnt_fslink);
+	spin_unlock(&vfsmount_lock);
+	mntput(mnt);
+	spin_lock(&vfsmount_lock);
+}
+
 void umount_tree(struct vfsmount *mnt)
 {
 	struct vfsmount *p;
@@ -422,20 +449,35 @@ void umount_tree(struct vfsmount *mnt)
 		mnt = list_entry(kill.next, struct vfsmount, mnt_list);
 		list_del_init(&mnt->mnt_list);
 		list_del_init(&mnt->mnt_fslink);
-		if (mnt->mnt_parent == mnt) {
-			spin_unlock(&vfsmount_lock);
+		if (mnt->mnt_parent != mnt &&
+			IS_MNT_SHARED(mnt->mnt_parent)) {
+			struct vfspnode *parent_pnode
+				= mnt->mnt_parent->mnt_pnode;
+			BUG_ON(!parent_pnode);
+			pnode_umount(parent_pnode,
+				mnt->mnt_mountpoint,
+				mnt->mnt_root);
 		} else {
-			struct nameidata old_nd;
-			detach_mnt(mnt, &old_nd);
-			spin_unlock(&vfsmount_lock);
-			path_release(&old_nd);
+			if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) {
+				BUG_ON(!mnt->mnt_pnode);
+				pnode_disassociate_mnt(mnt);
+			}
+			do_detach_mount(mnt);
 		}
-		mntput(mnt);
-		spin_lock(&vfsmount_lock);
 	}
 }
 
-static int do_umount(struct vfsmount *mnt, int flags)
+/*
+ * return true if the refcount is greater than count
+ */
+int do_refcount_check(struct vfsmount *mnt, int count)
+{
+
+	int mycount = atomic_read(&mnt->mnt_count);
+	return (mycount > count);
+}
+
+int do_umount(struct vfsmount *mnt, int flags)
 {
 	struct super_block * sb = mnt->mnt_sb;
 	int retval;
@@ -516,7 +558,7 @@ static int do_umount(struct vfsmount *mn
 		spin_lock(&vfsmount_lock);
 	}
 	retval = -EBUSY;
-	if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
+	if (flags & MNT_DETACH || !mount_busy(mnt)) {
 		if (!list_empty(&mnt->mnt_list))
 			umount_tree(mnt);
 		retval = 0;
@@ -858,7 +900,9 @@ detach_recursive_mnt(struct vfsmount *so
 
 	detach_mnt(source_mnt, nd);
 	for (m = source_mnt; m; m = next_mnt(m, source_mnt)) {
+		spin_lock(&vfspnode_lock);
 		list_del_init(&m->mnt_pnode_mntlist);
+		spin_unlock(&vfspnode_lock);
 	}
 	return;
 }
Index: 2.6.12/fs/pnode.c
===================================================================
--- 2.6.12.orig/fs/pnode.c
+++ 2.6.12/fs/pnode.c
@@ -228,6 +228,124 @@ pnode_add_member_mnt(struct vfspnode *pn
 	pnode_add_mnt(pnode, mnt, 0);
 }
 
+static int
+vfs_busy(struct vfsmount *mnt, struct dentry *dentry, struct dentry *rootdentry,
+		struct vfsmount *origmnt)
+{
+	struct vfsmount *child_mnt;
+	int ret=0;
+
+	spin_unlock(&vfsmount_lock);
+	child_mnt = lookup_mnt(mnt, dentry, rootdentry);
+	spin_lock(&vfsmount_lock);
+
+	if (!child_mnt)
+		return 0;
+
+	if (list_empty(&child_mnt->mnt_mounts)) {
+		if (origmnt == child_mnt)
+			ret = do_refcount_check(child_mnt, 3);
+		else
+			ret = do_refcount_check(child_mnt, 2);
+	}
+	mntput(child_mnt);
+	return ret;
+}
+
+int
+pnode_mount_busy(struct vfspnode *pnode, struct dentry *mntpt, struct dentry *root,
+			struct vfsmount *mnt)
+{
+	int ret=0,traversal;
+     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
+	struct pcontext context;
+
+	context.start = pnode;
+	context.pnode = NULL;
+	while(pnode_next(&context)) {
+		traversal = context.traversal;
+		pnode = context.pnode;
+		if(traversal == PNODE_UP) {
+			// traverse member vfsmounts
+			spin_lock(&vfspnode_lock);
+			list_for_each_entry_safe(member_mnt,
+				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if ((ret = vfs_busy(member_mnt, mntpt, root, mnt)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			list_for_each_entry_safe(slave_mnt,
+				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if ((ret = vfs_busy(slave_mnt, mntpt, root, mnt)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			spin_unlock(&vfspnode_lock);
+		}
+	}
+out:
+	return ret;
+}
+
+int
+vfs_umount(struct vfsmount *mnt, struct dentry *dentry, struct dentry *rootdentry)
+{
+	struct vfsmount *child_mnt;
+
+	spin_unlock(&vfsmount_lock);
+	child_mnt = lookup_mnt(mnt, dentry, rootdentry);
+	spin_lock(&vfsmount_lock);
+	mntput(child_mnt);
+	if (child_mnt && list_empty(&child_mnt->mnt_mounts)) {
+		do_detach_mount(child_mnt);
+		if (child_mnt->mnt_pnode)
+			pnode_disassociate_mnt(child_mnt);
+	}
+	return 0;
+}
+
+int
+pnode_umount(struct vfspnode *pnode, struct dentry *dentry,
+			struct dentry *rootdentry)
+{
+	int ret=0,traversal;
+     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
+	struct pcontext context;
+
+	context.start = pnode;
+	context.pnode = NULL;
+	while(pnode_next(&context)) {
+		traversal = context.traversal;
+		pnode = context.pnode;
+		if(traversal == PNODE_UP) {
+			// traverse member vfsmounts
+			spin_lock(&vfspnode_lock);
+			list_for_each_entry_safe(member_mnt,
+				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if ((ret = vfs_umount(member_mnt,
+						dentry, rootdentry)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			list_for_each_entry_safe(slave_mnt,
+				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				if ((ret = vfs_umount(slave_mnt,
+						dentry, rootdentry)))
+					goto out;
+				spin_lock(&vfspnode_lock);
+			}
+			spin_unlock(&vfspnode_lock);
+		}
+	}
+out:
+	return ret;
+}
+
+
 void
 pnode_add_slave_mnt(struct vfspnode *pnode, struct vfsmount *mnt)
 {
Index: 2.6.12/include/linux/fs.h
===================================================================
--- 2.6.12.orig/include/linux/fs.h
+++ 2.6.12/include/linux/fs.h
@@ -1216,11 +1216,14 @@ extern struct vfsmount *kern_mount(struc
 extern int may_umount_tree(struct vfsmount *);
 extern int may_umount(struct vfsmount *);
 extern long do_mount(char *, char *, char *, unsigned long, void *);
+extern int do_umount(struct vfsmount *, int);
 extern struct vfsmount *do_attach_prepare_mnt(struct vfsmount *,
 		struct dentry *, struct vfsmount *, int );
 extern void do_attach_real_mnt(struct vfsmount *);
 extern struct vfsmount *do_make_mounted(struct vfsmount *, struct dentry *);
 extern int do_make_unmounted(struct vfsmount *);
+extern void do_detach_mount(struct vfsmount *);
+extern int do_refcount_check(struct vfsmount *, int );
 
 extern int vfs_statfs(struct super_block *, struct kstatfs *);
 
Index: 2.6.12/include/linux/pnode.h
===================================================================
--- 2.6.12.orig/include/linux/pnode.h
+++ 2.6.12/include/linux/pnode.h
@@ -67,6 +67,9 @@ put_pnode(struct vfspnode *pnode)
 
 void __init pnode_init(unsigned long );
 struct vfspnode * pnode_alloc(void);
+void pnode_free(struct vfspnode *);
+int pnode_is_busy(struct vfspnode *);
+int pnode_umount_vfs(struct vfspnode *, struct dentry *, struct dentry *, int);
 void pnode_add_slave_mnt(struct vfspnode *, struct vfsmount *);
 void pnode_add_member_mnt(struct vfspnode *, struct vfsmount *);
 void pnode_del_slave_mnt(struct vfsmount *);
@@ -79,5 +82,7 @@ int  pnode_make_unmounted(struct vfspnod
 int pnode_prepare_mount(struct vfspnode *, struct vfspnode *, struct dentry *,
 		struct vfsmount *, struct vfsmount *);
 int pnode_real_mount(struct vfspnode *, int);
+int pnode_umount(struct vfspnode *, struct dentry *, struct dentry *);
+int pnode_mount_busy(struct vfspnode *, struct dentry *, struct dentry *, struct vfsmount *);
 #endif /* KERNEL */
 #endif /* _LINUX_PNODE_H */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 6/8] clone a namespace containing shared/private/slave/unclone tree
       [not found]           ` <1120816720.30164.28.camel@localhost>
@ 2005-07-08 10:26             ` Ram
       [not found]             ` <1120816835.30164.31.camel@localhost>
  1 sibling, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 10:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 93 bytes --]

Adds ability to clone a namespace that has shared/private/slave/unclone
subtrees in it.

RP


[-- Attachment #2: namespace.patch --]
[-- Type: text/x-patch, Size: 909 bytes --]

Adds ability to clone a namespace that has shared/private/slave/unclone
subtrees in it.

RP


Signed by Ram Pai (linuxram@us.ibm.com)

 namespace.c |    9 +++++++++
 1 files changed, 9 insertions(+)

Index: 2.6.12/fs/namespace.c
===================================================================
--- 2.6.12.orig/fs/namespace.c
+++ 2.6.12/fs/namespace.c
@@ -1685,6 +1685,12 @@ int copy_namespace(int flags, struct tas
 	q = new_ns->root;
 	while (p) {
 		q->mnt_namespace = new_ns;
+
+		if (IS_MNT_SHARED(q))
+			pnode_add_member_mnt(q->mnt_pnode, q);
+		else if (IS_MNT_SLAVE(q))
+			SET_MNT_PRIVATE(q);
+
 		if (fs) {
 			if (p == fs->rootmnt) {
 				rootmnt = p;
@@ -2054,6 +2060,8 @@ void __put_namespace(struct namespace *n
 	spin_lock(&vfsmount_lock);
 
 	list_for_each_entry(mnt, &namespace->list, mnt_list) {
+		if (mnt->mnt_pnode)
+			pnode_disassociate_mnt(mnt);
 		mnt->mnt_namespace = NULL;
 	}
 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 7/8] automounter support for shared/slave/private/unclone
       [not found]             ` <1120816835.30164.31.camel@localhost>
@ 2005-07-08 10:26               ` Ram
  2005-07-08 10:26                 ` [RFC PATCH 8/8] pnode.c optimization Ram
  0 siblings, 1 reply; 48+ messages in thread
From: Ram @ 2005-07-08 10:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 81 bytes --]

adds support for mount/umount propogation for autofs initiated 
operations,

RP


[-- Attachment #2: automount.patch --]
[-- Type: text/x-patch, Size: 9709 bytes --]

adds support for mount/umount propogation for autofs initiated operations,
RP


Index: 2.6.12/fs/namespace.c
===================================================================
--- 2.6.12.orig/fs/namespace.c
+++ 2.6.12/fs/namespace.c
@@ -205,9 +205,12 @@ struct vfsmount *do_attach_prepare_mnt(s
 	struct vfsmount *child_mnt;
 	struct nameidata nd;
 
-	if (clone_flag)
+	if (clone_flag) {
 		child_mnt = clone_mnt(template_mnt, template_mnt->mnt_root);
-	else
+		spin_lock(&vfsmount_lock);
+		list_del_init(&child_mnt->mnt_fslink);
+		spin_unlock(&vfsmount_lock);
+	} else
 		child_mnt = template_mnt;
 
 	nd.mnt = mnt;
@@ -344,38 +347,16 @@ struct seq_operations mounts_op = {
  * open files, pwds, chroots or sub mounts that are
  * busy.
  */
-//TOBEFIXED
 int may_umount_tree(struct vfsmount *mnt)
 {
-	struct list_head *next;
-	struct vfsmount *this_parent = mnt;
-	int actual_refs;
-	int minimum_refs;
+	int actual_refs=0;
+	int minimum_refs=0;
+	struct vfsmount *p;
 
 	spin_lock(&vfsmount_lock);
-	actual_refs = atomic_read(&mnt->mnt_count);
-	minimum_refs = 2;
-repeat:
-	next = this_parent->mnt_mounts.next;
-resume:
-	while (next != &this_parent->mnt_mounts) {
-		struct vfsmount *p = list_entry(next, struct vfsmount, mnt_child);
-
-		next = next->next;
-
+	for (p = mnt; p; p = next_mnt(p, mnt)) {
 		actual_refs += atomic_read(&p->mnt_count);
 		minimum_refs += 2;
-
-		if (!list_empty(&p->mnt_mounts)) {
-			this_parent = p;
-			goto repeat;
-		}
-	}
-
-	if (this_parent != mnt) {
-		next = this_parent->mnt_child.next;
-		this_parent = this_parent->mnt_parent;
-		goto resume;
 	}
 	spin_unlock(&vfsmount_lock);
 
@@ -387,18 +368,18 @@ resume:
 
 EXPORT_SYMBOL(may_umount_tree);
 
-int mount_busy(struct vfsmount *mnt)
+int mount_busy(struct vfsmount *mnt, int refcnt)
 {
 	struct vfspnode *parent_pnode;
 
 	if (mnt == mnt->mnt_parent || !IS_MNT_SHARED(mnt->mnt_parent))
-		return do_refcount_check(mnt, 2);
+		return do_refcount_check(mnt, refcnt);
 
 	parent_pnode = mnt->mnt_parent->mnt_pnode;
 	BUG_ON(!parent_pnode);
 	return pnode_mount_busy(parent_pnode,
 			mnt->mnt_mountpoint,
-			mnt->mnt_root, mnt);
+			mnt->mnt_root, mnt, refcnt);
 }
 
 /**
@@ -416,7 +397,9 @@ int mount_busy(struct vfsmount *mnt)
  */
 int may_umount(struct vfsmount *mnt)
 {
-	return (!mount_busy(mnt));
+	if (mount_busy(mnt,2))
+		return -EBUSY;
+	return 0;
 }
 
 EXPORT_SYMBOL(may_umount);
@@ -435,6 +418,25 @@ void do_detach_mount(struct vfsmount *mn
 	spin_lock(&vfsmount_lock);
 }
 
+void umount_mnt(struct vfsmount *mnt)
+{
+	if (mnt->mnt_parent != mnt &&
+		IS_MNT_SHARED(mnt->mnt_parent)) {
+		struct vfspnode *parent_pnode
+			= mnt->mnt_parent->mnt_pnode;
+		BUG_ON(!parent_pnode);
+		pnode_umount(parent_pnode,
+			mnt->mnt_mountpoint,
+			mnt->mnt_root);
+	} else {
+		if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) {
+			BUG_ON(!mnt->mnt_pnode);
+			pnode_disassociate_mnt(mnt);
+		}
+		do_detach_mount(mnt);
+	}
+}
+
 void umount_tree(struct vfsmount *mnt)
 {
 	struct vfsmount *p;
@@ -449,21 +451,7 @@ void umount_tree(struct vfsmount *mnt)
 		mnt = list_entry(kill.next, struct vfsmount, mnt_list);
 		list_del_init(&mnt->mnt_list);
 		list_del_init(&mnt->mnt_fslink);
-		if (mnt->mnt_parent != mnt &&
-			IS_MNT_SHARED(mnt->mnt_parent)) {
-			struct vfspnode *parent_pnode
-				= mnt->mnt_parent->mnt_pnode;
-			BUG_ON(!parent_pnode);
-			pnode_umount(parent_pnode,
-				mnt->mnt_mountpoint,
-				mnt->mnt_root);
-		} else {
-			if (IS_MNT_SHARED(mnt) || IS_MNT_SLAVE(mnt)) {
-				BUG_ON(!mnt->mnt_pnode);
-				pnode_disassociate_mnt(mnt);
-			}
-			do_detach_mount(mnt);
-		}
+		umount_mnt(mnt);
 	}
 }
 
@@ -558,7 +546,7 @@ int do_umount(struct vfsmount *mnt, int 
 		spin_lock(&vfsmount_lock);
 	}
 	retval = -EBUSY;
-	if (flags & MNT_DETACH || !mount_busy(mnt)) {
+	if (flags & MNT_DETACH || !mount_busy(mnt, 2)) {
 		if (!list_empty(&mnt->mnt_list))
 			umount_tree(mnt);
 		retval = 0;
@@ -964,14 +952,13 @@ struct vfsmount *do_make_mounted(struct 
 	newmnt->mnt_pnode = NULL;
 
 	if (newmnt) {
-		/* put in code for mount expiry */
-		/* TOBEDONE */
-
 		/*
 		 * walk through the mount list of mnt and move
 		 * them under the new mount
 		 */
 		spin_lock(&vfsmount_lock);
+		list_del_init(&newmnt->mnt_fslink);
+
 		list_for_each_entry_safe(child_mnt, next,
 				&mnt->mnt_mounts, mnt_child) {
 
@@ -1412,6 +1399,8 @@ void mark_mounts_for_expiry(struct list_
 	if (list_empty(mounts))
 		return;
 
+	down_write(&namespace_sem);
+
 	spin_lock(&vfsmount_lock);
 
 	/* extract from the expiration list every vfsmount that matches the
@@ -1421,8 +1410,7 @@ void mark_mounts_for_expiry(struct list_
 	 *   cleared by mntput())
 	 */
 	list_for_each_entry_safe(mnt, next, mounts, mnt_fslink) {
-		if (!xchg(&mnt->mnt_expiry_mark, 1) ||
-		    atomic_read(&mnt->mnt_count) != 1)
+		if (!xchg(&mnt->mnt_expiry_mark, 1) || mount_busy(mnt, 1))
 			continue;
 
 		mntget(mnt);
@@ -1430,12 +1418,13 @@ void mark_mounts_for_expiry(struct list_
 	}
 
 	/*
-	 * go through the vfsmounts we've just consigned to the graveyard to
-	 * - check that they're still dead
+	 * go through the vfsmounts we've just consigned to the graveyard
 	 * - delete the vfsmount from the appropriate namespace under lock
 	 * - dispose of the corpse
 	 */
 	while (!list_empty(&graveyard)) {
+		struct super_block *sb;
+
 		mnt = list_entry(graveyard.next, struct vfsmount, mnt_fslink);
 		list_del_init(&mnt->mnt_fslink);
 
@@ -1446,60 +1435,25 @@ void mark_mounts_for_expiry(struct list_
 			continue;
 		get_namespace(namespace);
 
-		spin_unlock(&vfsmount_lock);
-		down_write(&namespace_sem);
-		spin_lock(&vfsmount_lock);
-
-		/* check that it is still dead: the count should now be 2 - as
-		 * contributed by the vfsmount parent and the mntget above */
-		if (atomic_read(&mnt->mnt_count) == 2) {
-			struct vfsmount *xdmnt;
-			struct dentry *xdentry;
-
-			/* delete from the namespace */
-			list_del_init(&mnt->mnt_list);
-			list_del_init(&mnt->mnt_child);
-			list_del_init(&mnt->mnt_hash);
-			mnt->mnt_mountpoint->d_mounted--;
-
-			xdentry = mnt->mnt_mountpoint;
-			mnt->mnt_mountpoint = mnt->mnt_root;
-			xdmnt = mnt->mnt_parent;
-			mnt->mnt_parent = mnt;
-
-			spin_unlock(&vfsmount_lock);
-
-			mntput(xdmnt);
-			dput(xdentry);
-
-			/* now lay it to rest if this was the last ref on the
-			 * superblock */
-			if (atomic_read(&mnt->mnt_sb->s_active) == 1) {
-				/* last instance - try to be smart */
-				lock_kernel();
-				DQUOT_OFF(mnt->mnt_sb);
-				acct_auto_close(mnt->mnt_sb);
-				unlock_kernel();
-			}
-
-			mntput(mnt);
-		} else {
-			/* someone brought it back to life whilst we didn't
-			 * have any locks held so return it to the expiration
-			 * list */
-			list_add_tail(&mnt->mnt_fslink, mounts);
-			spin_unlock(&vfsmount_lock);
+		sb = mnt->mnt_sb;
+		umount_mnt(mnt);
+		/*
+		 * now lay it to rest if this was the last ref on the
+		 * superblock
+		 */
+		if (atomic_read(&sb->s_active) == 1) {
+			/* last instance - try to be smart */
+			lock_kernel();
+			DQUOT_OFF(sb);
+			acct_auto_close(sb);
+			unlock_kernel();
 		}
-
-		up_write(&namespace_sem);
-
 		mntput(mnt);
-		put_namespace(namespace);
 
-		spin_lock(&vfsmount_lock);
+		put_namespace(namespace);
 	}
-
 	spin_unlock(&vfsmount_lock);
+	up_write(&namespace_sem);
 }
 
 EXPORT_SYMBOL_GPL(mark_mounts_for_expiry);
Index: 2.6.12/fs/pnode.c
===================================================================
--- 2.6.12.orig/fs/pnode.c
+++ 2.6.12/fs/pnode.c
@@ -230,7 +230,7 @@ pnode_add_member_mnt(struct vfspnode *pn
 
 static int
 vfs_busy(struct vfsmount *mnt, struct dentry *dentry, struct dentry *rootdentry,
-		struct vfsmount *origmnt)
+		struct vfsmount *origmnt, int refcnt)
 {
 	struct vfsmount *child_mnt;
 	int ret=0;
@@ -244,9 +244,9 @@ vfs_busy(struct vfsmount *mnt, struct de
 
 	if (list_empty(&child_mnt->mnt_mounts)) {
 		if (origmnt == child_mnt)
-			ret = do_refcount_check(child_mnt, 3);
+			ret = do_refcount_check(child_mnt, refcnt+1);
 		else
-			ret = do_refcount_check(child_mnt, 2);
+			ret = do_refcount_check(child_mnt, refcnt);
 	}
 	mntput(child_mnt);
 	return ret;
@@ -254,7 +254,7 @@ vfs_busy(struct vfsmount *mnt, struct de
 
 int
 pnode_mount_busy(struct vfspnode *pnode, struct dentry *mntpt, struct dentry *root,
-			struct vfsmount *mnt)
+			struct vfsmount *mnt, int refcnt)
 {
 	int ret=0,traversal;
      	struct vfsmount *slave_mnt, *member_mnt, *t_m;
@@ -271,14 +271,14 @@ pnode_mount_busy(struct vfspnode *pnode,
 			list_for_each_entry_safe(member_mnt,
 				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
 				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_busy(member_mnt, mntpt, root, mnt)))
+				if ((ret = vfs_busy(member_mnt, mntpt, root, mnt, refnt)))
 					goto out;
 				spin_lock(&vfspnode_lock);
 			}
 			list_for_each_entry_safe(slave_mnt,
 				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
 				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_busy(slave_mnt, mntpt, root, mnt)))
+				if ((ret = vfs_busy(slave_mnt, mntpt, root, mnt, refcnt)))
 					goto out;
 				spin_lock(&vfspnode_lock);
 			}
Index: 2.6.12/include/linux/pnode.h
===================================================================
--- 2.6.12.orig/include/linux/pnode.h
+++ 2.6.12/include/linux/pnode.h
@@ -83,6 +83,6 @@ int pnode_prepare_mount(struct vfspnode 
 		struct vfsmount *, struct vfsmount *);
 int pnode_real_mount(struct vfspnode *, int);
 int pnode_umount(struct vfspnode *, struct dentry *, struct dentry *);
-int pnode_mount_busy(struct vfspnode *, struct dentry *, struct dentry *, struct vfsmount *);
+int pnode_mount_busy(struct vfspnode *, struct dentry *, struct dentry *, struct vfsmount *, int);
 #endif /* KERNEL */
 #endif /* _LINUX_PNODE_H */

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH 8/8] pnode.c optimization
  2005-07-08 10:26               ` [RFC PATCH 7/8] automounter support for shared/slave/private/unclone Ram
@ 2005-07-08 10:26                 ` Ram
  0 siblings, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 10:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, viro, Andrew Morton, mike, bfields, Miklos Szeredi

[-- Attachment #1: Type: text/plain, Size: 56 bytes --]

Optimizes lots of redundant pnode propogation code.
RP


[-- Attachment #2: pnode_opt.patch --]
[-- Type: text/x-patch, Size: 19171 bytes --]

Optimizes lots of redundant pnode propogation code.
RP


Index: 2.6.12/fs/pnode.c
===================================================================
--- 2.6.12.orig/fs/pnode.c
+++ 2.6.12/fs/pnode.c
@@ -204,34 +204,144 @@ out:
 }
 
 
-static void inline
-pnode_add_mnt(struct vfspnode *pnode, struct vfsmount *mnt, int slave)
+/*
+ * traverse the pnode tree and at each pnode encountered, execute the
+ * pnode_fnc(). For each vfsmount encountered call the vfs_fnc().
+ *
+ * @pnode: pnode tree to be traversed
+ * @in_data: input data
+ * @out_data: output data
+ * @pnode_pre_func: function to be called when a new pnode is encountered
+ * 		the first time. (PNODE_DOWN)
+ * @pnode_post_func: function to be called when a new pnode is encountered
+ * 		each time a child pnode is done processing with.(PNODE_MID)
+ * @vfs_func: function to be called on each slave and member vfs belonging
+ * 		to the pnode. Called when the pnode is done processing with.
+ */
+static int
+pnode_traverse(struct vfspnode *pnode,
+		void *in_data,
+		void **out_data,
+		int (*pnode_pre_func)(struct vfspnode *, void *, void **,
+			void **, va_list ),
+		int (*pnode_post_func)(struct vfspnode *, void *, void *, va_list),
+		int (*vfs_func)(struct vfsmount *, int, void *,  va_list),
+		...)
 {
-	if (!pnode || !mnt)
-		return;
-	spin_lock(&vfspnode_lock);
-	mnt->mnt_pnode = pnode;
-	if (slave) {
-		SET_MNT_SLAVE(mnt);
-		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_slavevfs);
-	} else {
-		SET_MNT_SHARED(mnt);
-		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_vfs);
+	va_list args;
+	int ret=0,traversal,level;
+	void *my_data, *data_from_master, *data_for_slave, *data_from_slave;
+     	struct vfspnode *slave_pnode, *master_pnode;
+     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
+	struct pcontext context;
+	static struct inoutdata p_array[PNODE_MAX_SLAVE_LEVEL];
+
+
+	context.start = pnode;
+	context.pnode = NULL;
+
+	/*
+	 * determine whether to process vfs first or the
+	 * slave pnode first
+	 */
+	while(pnode_next(&context)) {
+		traversal = context.traversal;
+		level = context.level;
+		pnode = context.pnode;
+		slave_pnode = context.slave_pnode;
+		master_pnode = context.master_pnode;
+
+		if (traversal == PNODE_DOWN ) {
+			if (master_pnode) {
+				data_from_master = p_array[level-1].in_data;
+			} else {
+				data_from_master = in_data;
+			}
+
+			if (pnode_pre_func) {
+				va_start(args, vfs_func);
+				if((ret = pnode_pre_func(pnode, data_from_master,
+						&my_data, &data_for_slave, args))) {
+					goto error;
+				}
+				va_end(args);
+			} else {
+				my_data = data_from_master;
+				data_for_slave = NULL;
+			}
+			p_array[level].my_data = my_data;
+			p_array[level].in_data = data_for_slave;
+
+		} else if(traversal == PNODE_MID) {
+
+			my_data = p_array[level].my_data;
+			if (slave_pnode) {
+				data_from_slave = p_array[level+1].out_data;
+			} else {
+				data_from_slave = NULL;
+			}
+			if (pnode_post_func) {
+				va_start(args, vfs_func);
+				if((ret = pnode_post_func(pnode, my_data,
+						data_from_slave, args))) {
+					goto error;
+				}
+				va_end(args);
+			}
+
+		} else if(traversal == PNODE_UP) {
+
+			my_data = p_array[level].my_data;
+
+			// traverse member vfsmounts
+
+			spin_lock(&vfspnode_lock);
+			list_for_each_entry_safe(member_mnt,
+				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				va_start(args, vfs_func);
+				if ((ret = vfs_func(member_mnt, PNODE_MEMBER_VFS,
+					my_data, args))) {
+					goto error;
+				}
+				va_end(args);
+				spin_lock(&vfspnode_lock);
+			}
+			list_for_each_entry_safe(slave_mnt,
+				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
+				spin_unlock(&vfspnode_lock);
+				va_start(args, vfs_func);
+				if ((ret = vfs_func(slave_mnt, PNODE_SLAVE_VFS,
+					my_data, args))) {
+					goto error;
+				}
+				va_end(args);
+				spin_lock(&vfspnode_lock);
+			}
+			spin_unlock(&vfspnode_lock);
+			p_array[level].out_data = my_data;
+		}
 	}
-	get_pnode(pnode);
-	spin_unlock(&vfspnode_lock);
-}
 
-void
-pnode_add_member_mnt(struct vfspnode *pnode, struct vfsmount *mnt)
-{
-	pnode_add_mnt(pnode, mnt, 0);
+out:
+	if (out_data)
+		*out_data = p_array[0].my_data;
+	return ret;
+error:
+	va_end(args);
+	if (out_data)
+		*out_data = NULL;
+	goto out;
 }
 
+
 static int
-vfs_busy(struct vfsmount *mnt, struct dentry *dentry, struct dentry *rootdentry,
-		struct vfsmount *origmnt, int refcnt)
+vfs_busy(struct vfsmount *mnt, int flag, void *indata,  va_list args)
 {
+	struct dentry *dentry = va_arg(args, struct dentry *);
+	struct dentry *rootdentry = va_arg(args, struct dentry *);
+	struct vfsmount *origmnt = va_arg(args, struct vfsmount *);
+	int    refcnt = va_arg(args, int);
 	struct vfsmount *child_mnt;
 	int ret=0;
 
@@ -256,52 +366,32 @@ int
 pnode_mount_busy(struct vfspnode *pnode, struct dentry *mntpt, struct dentry *root,
 			struct vfsmount *mnt, int refcnt)
 {
-	int ret=0,traversal;
-     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
-	struct pcontext context;
-
-	context.start = pnode;
-	context.pnode = NULL;
-	while(pnode_next(&context)) {
-		traversal = context.traversal;
-		pnode = context.pnode;
-		if(traversal == PNODE_UP) {
-			// traverse member vfsmounts
-			spin_lock(&vfspnode_lock);
-			list_for_each_entry_safe(member_mnt,
-				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_busy(member_mnt, mntpt, root, mnt, refnt)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			list_for_each_entry_safe(slave_mnt,
-				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_busy(slave_mnt, mntpt, root, mnt, refcnt)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			spin_unlock(&vfspnode_lock);
-		}
-	}
-out:
-	return ret;
+	return pnode_traverse(pnode, NULL, NULL,
+			NULL, NULL, vfs_busy, mntpt, root, mnt, refcnt);
 }
 
+
 int
-vfs_umount(struct vfsmount *mnt, struct dentry *dentry, struct dentry *rootdentry)
+vfs_umount(struct vfsmount *mnt, int flag, void *indata, va_list args)
 {
 	struct vfsmount *child_mnt;
+	struct dentry *dentry, *rootdentry;
+
+
+	dentry = va_arg(args, struct dentry *);
+	rootdentry = va_arg(args, struct dentry *);
 
 	spin_unlock(&vfsmount_lock);
 	child_mnt = lookup_mnt(mnt, dentry, rootdentry);
 	spin_lock(&vfsmount_lock);
 	mntput(child_mnt);
 	if (child_mnt && list_empty(&child_mnt->mnt_mounts)) {
-		do_detach_mount(child_mnt);
-		if (child_mnt->mnt_pnode)
+		if (IS_MNT_SHARED(child_mnt) ||
+				IS_MNT_SLAVE(child_mnt)) {
+			BUG_ON(!child_mnt->mnt_pnode);
 			pnode_disassociate_mnt(child_mnt);
+		}
+		do_detach_mount(child_mnt);
 	}
 	return 0;
 }
@@ -310,41 +400,33 @@ int
 pnode_umount(struct vfspnode *pnode, struct dentry *dentry,
 			struct dentry *rootdentry)
 {
-	int ret=0,traversal;
-     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
-	struct pcontext context;
+	return pnode_traverse(pnode, NULL, (void *)NULL,
+			NULL, NULL, vfs_umount, dentry, rootdentry);
+}
 
-	context.start = pnode;
-	context.pnode = NULL;
-	while(pnode_next(&context)) {
-		traversal = context.traversal;
-		pnode = context.pnode;
-		if(traversal == PNODE_UP) {
-			// traverse member vfsmounts
-			spin_lock(&vfspnode_lock);
-			list_for_each_entry_safe(member_mnt,
-				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_umount(member_mnt,
-						dentry, rootdentry)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			list_for_each_entry_safe(slave_mnt,
-				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_umount(slave_mnt,
-						dentry, rootdentry)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			spin_unlock(&vfspnode_lock);
-		}
+static void inline
+pnode_add_mnt(struct vfspnode *pnode, struct vfsmount *mnt, int slave)
+{
+	if (!pnode || !mnt)
+		return;
+	spin_lock(&vfspnode_lock);
+	mnt->mnt_pnode = pnode;
+	if (slave) {
+		SET_MNT_SLAVE(mnt);
+		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_slavevfs);
+	} else {
+		SET_MNT_SHARED(mnt);
+		list_add(&mnt->mnt_pnode_mntlist, &pnode->pnode_vfs);
 	}
-out:
-	return ret;
+	get_pnode(pnode);
+	spin_unlock(&vfspnode_lock);
 }
 
+void
+pnode_add_member_mnt(struct vfspnode *pnode, struct vfsmount *mnt)
+{
+	pnode_add_mnt(pnode, mnt, 0);
+}
 
 void
 pnode_add_slave_mnt(struct vfspnode *pnode, struct vfsmount *mnt)
@@ -405,6 +487,7 @@ pnode_disassociate_mnt(struct vfsmount *
 	mnt->mnt_flags &= ~MNT_SHARED;
 }
 
+
 // merge pnode into peer_pnode and get rid of pnode
 int
 pnode_merge_pnode(struct vfspnode *pnode, struct vfspnode *peer_pnode)
@@ -451,82 +534,84 @@ pnode_merge_pnode(struct vfspnode *pnode
 	return 0;
 }
 
-struct vfsmount *
-pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
+int
+pnode_mount_pre_func(struct vfspnode *pnode, void *indata,
+		void **outdata, void **out_slavedata, va_list args)
 {
-	struct vfsmount *child_mnt;
-	int ret=0,traversal,level;
-     	struct vfspnode *slave_pnode, *master_pnode, *child_pnode, *slave_child_pnode;
-     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
-	struct pcontext context;
-	static struct inoutdata p_array[PNODE_MAX_SLAVE_LEVEL];
+	struct vfspnode *pnode_child=NULL;
+	int ret=0;
 
-	context.start = pnode;
-	context.pnode = NULL;
+	*out_slavedata = *outdata = NULL;
 
-	while(pnode_next(&context)) {
-		traversal = context.traversal;
-		level = context.level;
-		pnode = context.pnode;
-		slave_pnode = context.slave_pnode;
-		master_pnode = context.master_pnode;
+	if(indata) {
+		*outdata = indata;
+		goto out;
+	}
 
-		if (traversal == PNODE_DOWN ) {
-			if (master_pnode) {
-				child_pnode = (struct vfspnode *)p_array[level-1].in_data;
-			} else {
-				child_pnode = NULL;
-			}
-			while (!(child_pnode = pnode_alloc()))
-				schedule();
-			p_array[level].my_data = (void *)child_pnode;
-			p_array[level].in_data = NULL;
+	while (!(pnode_child = pnode_alloc()))
+		schedule();
 
-		} else if(traversal == PNODE_MID) {
+	*outdata = (void *)pnode_child;
+out:
+	return ret;
+}
 
-			child_pnode = (struct vfspnode *)p_array[level].my_data;
-			slave_child_pnode = p_array[level+1].out_data;
-			pnode_add_slave_pnode(child_pnode, slave_child_pnode);
+int
+pnode_mount_post_func(struct vfspnode *pnode, void *indata, void *outdata, va_list args)
+{
+	struct vfspnode *pnode_child, *slave_child_pnode;
 
-		} else if(traversal == PNODE_UP) {
-			child_pnode = p_array[level].my_data;
-			spin_lock(&vfspnode_lock);
-			list_for_each_entry_safe(member_mnt,
-				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if (!(child_mnt = do_make_mounted(mnt, dentry))) {
-					ret = -ENOMEM;
-					goto out;
-				}
-				spin_lock(&vfspnode_lock);
-				pnode_add_member_mnt(child_pnode, child_mnt);
-			}
-			list_for_each_entry_safe(slave_mnt,
-				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if (!(child_mnt = do_make_mounted(mnt, dentry))) {
-					ret = -ENOMEM;
-					goto out;
-				}
-				spin_lock(&vfspnode_lock);
-				pnode_add_slave_mnt(child_pnode, child_mnt);
-			}
-			spin_unlock(&vfspnode_lock);
-			p_array[level].out_data = child_pnode;
-		}
+	pnode_child = (struct vfspnode *)indata;
+	slave_child_pnode = (struct vfspnode *)outdata;
+
+	if (pnode_child && slave_child_pnode)
+		pnode_add_slave_pnode(pnode_child, slave_child_pnode);
+
+	return 0;
+}
+
+int
+vfs_make_mounted_func(struct vfsmount *mnt, int flag, void *indata, va_list args)
+{
+	struct dentry *target_dentry;
+	int ret=0;
+	struct vfsmount *child_mount;
+	struct vfspnode *pnode;
+
+	target_dentry = va_arg(args, struct dentry *);
+	if (!(child_mount = do_make_mounted(mnt, target_dentry))) {
+		ret = -ENOMEM;
+		goto out;
+	}
+	pnode = (struct vfspnode *)indata;
+	switch (flag) {
+	case PNODE_SLAVE_VFS :
+		pnode_add_slave_mnt(pnode, child_mount);
+		break;
+	case PNODE_MEMBER_VFS :
+		pnode_add_member_mnt(pnode, child_mount);
+		break;
 	}
 
 out:
-	if (ret)
-		return NULL;
+	return ret;
+}
 
+struct vfsmount *
+pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
+{
+	struct vfsmount *child_mnt;
+	if(pnode_traverse(pnode, NULL, (void *)NULL,
+			pnode_mount_pre_func, pnode_mount_post_func,
+			vfs_make_mounted_func, (void *)dentry))
+		return NULL;
 	child_mnt = lookup_mnt(mnt, dentry, dentry);
 	mntput(child_mnt);
 	return child_mnt;
 }
 
 int
-vfs_make_unmounted(struct vfsmount *mnt)
+vfs_make_unmounted_func(struct vfsmount *mnt, int flag, void *indata,  va_list args)
 {
 	struct vfspnode *pnode;
 	int ret=0;
@@ -535,8 +620,11 @@ vfs_make_unmounted(struct vfsmount *mnt)
 		ret = 1;
 		goto out;
 	}
+
 	pnode = mnt->mnt_pnode;
+	spin_lock(&vfspnode_lock);
 	list_del_init(&mnt->mnt_pnode_mntlist);
+	spin_unlock(&vfspnode_lock);
 	put_pnode(pnode);
 out:
 	return ret;
@@ -545,49 +633,20 @@ out:
 int
 pnode_make_unmounted(struct vfspnode *pnode)
 {
-	int ret=0,traversal;
-     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
-	struct pcontext context;
-
-	context.start = pnode;
-	context.pnode = NULL;
-	while(pnode_next(&context)) {
-		traversal = context.traversal;
-		pnode = context.pnode;
-		if(traversal == PNODE_UP) {
-			// traverse member vfsmounts
-			spin_lock(&vfspnode_lock);
-			list_for_each_entry_safe(member_mnt,
-				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_make_unmounted(member_mnt)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			list_for_each_entry_safe(slave_mnt,
-				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_make_unmounted(slave_mnt)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			spin_unlock(&vfspnode_lock);
-		}
-	}
-
-out:
-	return ret;
+	return pnode_traverse(pnode, NULL, (void *)NULL,
+			NULL, NULL, vfs_make_unmounted_func);
 }
 
-
 int
-vfs_prepare_mount_func(struct vfsmount *mnt, int flag, struct vfspnode *pnode,
-		struct vfsmount *source_mnt,
-		struct dentry *mountpoint_dentry,
-		struct vfsmount *p_mnt)
-
+vfs_prepare_mount_func(struct vfsmount *mnt, int flag, void *indata, va_list args)
 {
-	struct vfsmount *child_mnt;
+	struct vfsmount *source_mnt, *child_mnt, *p_mnt;
+	struct dentry *mountpoint_dentry;
+	struct vfspnode *pnode = (struct vfspnode *)indata;
+
+	source_mnt = va_arg(args, struct vfsmount * );
+	mountpoint_dentry =  va_arg(args, struct dentry *);
+	p_mnt =  va_arg(args, struct vfsmount *);
 
 	if ((p_mnt != mnt) || (source_mnt == source_mnt->mnt_parent)) {
 		child_mnt = do_attach_prepare_mnt(mnt, mountpoint_dentry,
@@ -610,6 +669,7 @@ vfs_prepare_mount_func(struct vfsmount *
 }
 
 
+
 int
 pnode_prepare_mount(struct vfspnode *pnode,
 		struct vfspnode *master_child_pnode,
@@ -617,78 +677,42 @@ pnode_prepare_mount(struct vfspnode *pno
 		struct vfsmount *source_mnt,
 		struct vfsmount *mnt)
 {
-	int ret=0,traversal,level;
-     	struct vfspnode *slave_pnode, *master_pnode, *slave_child_pnode, *child_pnode;
-     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
-	struct pcontext context;
-	static struct inoutdata p_array[PNODE_MAX_SLAVE_LEVEL];
-
-	context.start = pnode;
-	context.pnode = NULL;
-	while(pnode_next(&context)) {
-		traversal = context.traversal;
-		level = context.level;
-		pnode = context.pnode;
-		slave_pnode = context.slave_pnode;
-		master_pnode = context.master_pnode;
-
-		if (traversal == PNODE_DOWN ) {
-
-			child_pnode = NULL;
-			if (!master_pnode)
-				child_pnode = master_child_pnode;
-
-			while (!(child_pnode = pnode_alloc()))
-				schedule();
-
-			p_array[level].my_data = child_pnode;
-
-		} else if(traversal == PNODE_MID) {
-
-			child_pnode = (struct vfspnode *)p_array[level].my_data;
-			slave_child_pnode = p_array[level+1].out_data;
-			pnode_add_slave_pnode(child_pnode, slave_child_pnode);
-
-		} else if(traversal == PNODE_UP) {
-
-			child_pnode = p_array[level].my_data;
-			spin_lock(&vfspnode_lock);
-			list_for_each_entry_safe(member_mnt,
-				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if((ret=vfs_prepare_mount_func(member_mnt,
-					PNODE_MEMBER_VFS, child_pnode, source_mnt,
-					mountpoint_dentry, mnt)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			list_for_each_entry_safe(slave_mnt,
-				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if((ret = vfs_prepare_mount_func(slave_mnt,
-					PNODE_SLAVE_VFS, child_pnode, source_mnt,
-					mountpoint_dentry, mnt)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			spin_unlock(&vfspnode_lock);
-			p_array[level].out_data = child_pnode;
+	return  pnode_traverse(pnode,
+			master_child_pnode,
+			(void *)NULL,
+			pnode_mount_pre_func,
+			pnode_mount_post_func,
+			vfs_prepare_mount_func,
+			source_mnt,
+			mountpoint_dentry,
+			mnt);
+}
 
-		}
+int
+pnode_real_mount_post_func(struct vfspnode *pnode, void *indata,
+		void *outdata, va_list args)
+{
+	if (va_arg(args, int)) {
+		spin_lock(&vfspnode_lock);
+		BUG_ON(!list_empty(&pnode->pnode_vfs));
+		BUG_ON(!list_empty(&pnode->pnode_slavevfs));
+		BUG_ON(!list_empty(&pnode->pnode_slavepnode));
+		list_del_init(&pnode->pnode_peer_slave);
+		spin_unlock(&vfspnode_lock);
+		put_pnode(pnode);
 	}
-
-out:
-	return ret;
+	return 0;
 }
 
-
 int
-vfs_real_mount_func(struct vfsmount *mnt, int delflag)
+vfs_real_mount_func(struct vfsmount *mnt, int flag, void *indata, va_list args)
 {
 	BUG_ON(mnt == mnt->mnt_parent);
 	do_attach_real_mnt(mnt);
-	if (delflag) {
+	if (va_arg(args, int)) {
+		spin_lock(&vfspnode_lock);
 		list_del_init(&mnt->mnt_pnode_mntlist);
+		spin_unlock(&vfspnode_lock);
 		mnt->mnt_pnode = NULL;
 		SET_MNT_PRIVATE(mnt);
 	}
@@ -704,43 +728,7 @@ vfs_real_mount_func(struct vfsmount *mnt
 int
 pnode_real_mount(struct vfspnode *pnode, int flag)
 {
-	int ret=0,traversal;
-     	struct vfsmount *slave_mnt, *member_mnt, *t_m;
-	struct pcontext context;
-
-	context.start = pnode;
-	context.pnode = NULL;
-	while(pnode_next(&context)) {
-		traversal = context.traversal;
-		pnode = context.pnode;
-		if(traversal == PNODE_MID) {
-			if (flag) {
-				BUG_ON(!list_empty(&pnode->pnode_vfs));
-				BUG_ON(!list_empty(&pnode->pnode_slavevfs));
-				BUG_ON(!list_empty(&pnode->pnode_slavepnode));
-				list_del_init(&pnode->pnode_peer_slave);
-				put_pnode(pnode);
-			}
-		} else if(traversal == PNODE_UP) {
-			// traverse member vfsmounts
-			spin_lock(&vfspnode_lock);
-			list_for_each_entry_safe(member_mnt,
-				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_real_mount_func(member_mnt, flag)))
-					goto out;
-			}
-			list_for_each_entry_safe(slave_mnt,
-				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
-				spin_unlock(&vfspnode_lock);
-				if ((ret = vfs_real_mount_func(slave_mnt, flag)))
-					goto out;
-				spin_lock(&vfspnode_lock);
-			}
-			spin_unlock(&vfspnode_lock);
-		}
-	}
-
-out:
-	return ret;
+	return  pnode_traverse(pnode,
+			NULL, (void *)NULL, NULL, pnode_real_mount_post_func,
+			vfs_real_mount_func, flag);
 }

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 10:25   ` [RFC PATCH 1/8] share/private/slave a subtree Ram
@ 2005-07-08 11:17     ` Pekka Enberg
  2005-07-08 12:19       ` Roman Zippel
  2005-07-08 16:29       ` [RFC PATCH 1/8] share/private/slave a subtree Ram
  2005-07-08 14:32     ` Miklos Szeredi
  1 sibling, 2 replies; 48+ messages in thread
From: Pekka Enberg @ 2005-07-08 11:17 UTC (permalink / raw)
  To: Ram
  Cc: linux-kernel, linux-fsdevel, viro, Andrew Morton, mike, bfields,
	Miklos Szeredi, Pekka Enberg

On 7/8/05, Ram <linuxram@us.ibm.com> wrote:
> This patch adds the shared/private/slave support for VFS trees.

Inlining the patches to email would be greatly appreciated. Here are
some comments.

> +int
> +_do_make_mounted(struct nameidata *nd, struct vfsmount **mnt)

Use two underscores to follow naming conventions.

> Index: 2.6.12/fs/pnode.c
> ===================================================================
> --- /dev/null
> +++ 2.6.12/fs/pnode.c
> @@ -0,0 +1,362 @@
> +
> +#define PNODE_MEMBER_VFS  0x01
> +#define PNODE_SLAVE_VFS   0x02

Enums, please.

> +
> +static kmem_cache_t * pnode_cachep;
> +
> +/* spinlock for pnode related operations */
> + __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfspnode_lock);
> +
> +
> +static void
> +pnode_init_fn(void *data, kmem_cache_t *cachep, unsigned long flags)
> +{
> +	struct vfspnode *pnode = (struct vfspnode *)data;

Redundant cast.

> +	INIT_LIST_HEAD(&pnode->pnode_vfs);
> +	INIT_LIST_HEAD(&pnode->pnode_slavevfs);
> +	INIT_LIST_HEAD(&pnode->pnode_slavepnode);
> +	INIT_LIST_HEAD(&pnode->pnode_peer_slave);
> +	pnode->pnode_master = NULL;
> +	pnode->pnode_flags = 0;
> +	atomic_set(&pnode->pnode_count,0);
> +}
> +
> +void __init
> +pnode_init(unsigned long mempages)
> +{
> +	pnode_cachep = kmem_cache_create("pnode_cache",
> +                       sizeof(struct vfspnode), 0,
> +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, pnode_init_fn, NULL);
> +}
> +
> +
> +struct vfspnode *
> +pnode_alloc(void)
> +{
> +	struct vfspnode *pnode =  (struct vfspnode *)kmem_cache_alloc(
> +			pnode_cachep, GFP_KERNEL);

Redundant cast.

> +struct inoutdata {

Wants a better name.

> +	void *my_data; /* produced and consumed by me */
> +	void *in_data; /* produced by master, consumed by slave */
> +	void *out_data; /* produced by slave, comsume by master */
> +};
> +
> +struct pcontext {
> +	struct vfspnode *start;
> +	int 	flag;
> +	int 	traversal;
> +	int	level;
> +	struct vfspnode *master_pnode;
> +	struct vfspnode *pnode;
> +	struct vfspnode *slave_pnode;
> +};
> +
> +
> +#define PNODE_UP 1
> +#define PNODE_DOWN 2
> +#define PNODE_MID 3

Enums, please.

> +
> +/*
> + * Walk the pnode tree for each pnode encountered.  A given pnode in the tree
> + * can be returned a minimum of 2 times.  First time the pnode is encountered,
> + * it is returned with the flag PNODE_DOWN. Everytime the pnode is encountered
> + * after having traversed through each of its children, it is returned with the
> + * flag PNODE_MID.  And finally when the pnode is encountered after having
> + * walked all of its children, it is returned with the flag PNODE_UP.
> + *
> + * @context: provides context on the state of the last walk in the pnode
> + * 		tree.
> + */
> +static int inline
> +pnode_next(struct pcontext *context)

Rather large function to be an inline.

> +{
> +	int traversal = context->traversal;
> +	int ret=0;
> +	struct vfspnode *pnode = context->pnode,
> +			*slave_pnode=context->slave_pnode,
> +			*master_pnode=context->master_pnode;

Add a separate declaration for each variable. The above is hard to read.

> +	struct list_head *next;
> +
> +	spin_lock(&vfspnode_lock);
> +	/*
> +	 * the first traversal will be on the root pnode
> +	 * with flag PNODE_DOWN
> +	 */
> +	if (!pnode) {
> +		context->pnode = get_pnode(context->start);
> +		context->master_pnode = NULL;
> +		context->traversal = PNODE_DOWN;
> +		context->slave_pnode = NULL;
> +		context->level = 0;
> +		ret = 1;
> +		goto out;
> +	}
> +
> +	/*
> +	 * if the last traversal was PNODE_UP, than the current
> +	 * traversal is PNODE_MID on the master pnode.
> +	 */
> +	if (traversal == PNODE_UP) {
> +		if (!master_pnode) {
> +			/* DONE. return */
> +			put_pnode(pnode);
> +			ret = 0;

Using goto out and dropping the else branch would make this more readable.

> +		} else {
> +			context->traversal = PNODE_MID;
> +			context->level--;
> +			context->pnode = master_pnode;
> +			put_pnode(slave_pnode);
> +			context->slave_pnode = pnode;
> +			context->master_pnode = (master_pnode ?
> +				master_pnode->pnode_master : NULL);
> +			ret = 1;
> +		}
> +	} else {
> +		if(traversal == PNODE_MID) {

Missing space before parenthesis.

> +			next = slave_pnode->pnode_peer_slave.next;
> +		} else {
> +			next = pnode->pnode_slavepnode.next;
> +		}

Please drop the extra braces.

> +		put_pnode(slave_pnode);
> +		context->slave_pnode = NULL;
> +		/*
> +		 * if the last traversal was PNODE_MID or PNODE_DOWN, and the
> +		 * master pnode has some slaves to traverse, the current
> +		 * traversal will be PNODE_DOWN on the slave pnode.
> +		 */
> +		if ((next != &pnode->pnode_slavepnode) &&
> +			(traversal == PNODE_DOWN || traversal == PNODE_MID)) {
> +			context->traversal = PNODE_DOWN;
> +			context->level++;
> +			context->pnode = get_pnode(list_entry(next,
> +				struct vfspnode, pnode_peer_slave));
> +			context->master_pnode = pnode;
> +			ret = 1;
> +		} else {
> +			/*
> +			 * since there are no more children, the current traversal
> +			 * is PNODE_UP on the same pnode
> +			 */
> +			context->traversal = PNODE_UP;
> +			ret = 1;

Would probably make more sense to check if
(next == &pnode->pnode_slavepnode && traversal == PNODE_UP) and use goto out to
get rid of the else branch.

> +		}
> +	}
> +out:
> +	spin_unlock(&vfspnode_lock);
> +	return ret;
> +}
> +
> +

> +static void
> +_pnode_disassociate_mnt(struct vfsmount *mnt)

Two underscores, please.

> +struct vfsmount *
> +pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
> +{
> +	struct vfsmount *child_mnt;
> +	int ret=0,traversal,level;

Spaces, please.

> +     	struct vfspnode *slave_pnode, *master_pnode, *child_pnode, *slave_child_pnode;
> +     	struct vfsmount *slave_mnt, *member_mnt, *t_m;

Formatting damage.

> +	struct pcontext context;
> +	static struct inoutdata p_array[PNODE_MAX_SLAVE_LEVEL];
> +
> +	context.start = pnode;
> +	context.pnode = NULL;
> +
> +	while(pnode_next(&context)) {

Missing space before parenthesis.

> +		traversal = context.traversal;
> +		level = context.level;
> +		pnode = context.pnode;
> +		slave_pnode = context.slave_pnode;
> +		master_pnode = context.master_pnode;
> +
> +		if (traversal == PNODE_DOWN ) {

Use switch statement here.

> +			if (master_pnode) {
> +				child_pnode = (struct vfspnode *)p_array[level-1].in_data;

Redundant cast.

> +			} else {
> +				child_pnode = NULL;
> +			}

Extra braces.

> +			while (!(child_pnode = pnode_alloc()))
> +				schedule();

This looks dangerous. Why this must not fail and in other places you 
return -ENOMEM?

> +			p_array[level].my_data = (void *)child_pnode;

Redundant cast.

> +			p_array[level].in_data = NULL;
> +
> +		} else if(traversal == PNODE_MID) {
> +
> +			child_pnode = (struct vfspnode *)p_array[level].my_data;

Redundant cast.

> +			slave_child_pnode = p_array[level+1].out_data;
> +			pnode_add_slave_pnode(child_pnode, slave_child_pnode);
> +
> +		} else if(traversal == PNODE_UP) {
> +			child_pnode = p_array[level].my_data;
> +			spin_lock(&vfspnode_lock);
> +			list_for_each_entry_safe(member_mnt,
> +				t_m, &pnode->pnode_vfs, mnt_pnode_mntlist) {
> +				spin_unlock(&vfspnode_lock);
> +				if (!(child_mnt = do_make_mounted(mnt, dentry))) {
> +					ret = -ENOMEM;
> +					goto out;
> +				}
> +				spin_lock(&vfspnode_lock);
> +				pnode_add_member_mnt(child_pnode, child_mnt);
> +			}
> +			list_for_each_entry_safe(slave_mnt,
> +				t_m, &pnode->pnode_slavevfs, mnt_pnode_mntlist) {
> +				spin_unlock(&vfspnode_lock);
> +				if (!(child_mnt = do_make_mounted(mnt, dentry))) {
> +					ret = -ENOMEM;
> +					goto out;
> +				}
> +				spin_lock(&vfspnode_lock);
> +				pnode_add_slave_mnt(child_pnode, child_mnt);
> +			}
> +			spin_unlock(&vfspnode_lock);
> +			p_array[level].out_data = child_pnode;
> +		}
> +	}
> +
> +out:
> +	if (ret)
> +		return NULL;
> +
> +	child_mnt = lookup_mnt(mnt, dentry, dentry);
> +	mntput(child_mnt);
> +	return child_mnt;
> +}
> Index: 2.6.12/include/linux/fs.h
> ===================================================================
> --- 2.6.12.orig/include/linux/fs.h
> +++ 2.6.12/include/linux/fs.h
> @@ -102,6 +102,9 @@ extern int dir_notify_enable;
>  #define MS_MOVE		8192
>  #define MS_REC		16384
>  #define MS_VERBOSE	32768
> +#define MS_PRIVATE	262144
> +#define MS_SLAVE	524288
> +#define MS_SHARED	1048576

The expression (1<<bit) would make more sense here.

> Index: 2.6.12/include/linux/pnode.h
> ===================================================================
> --- /dev/null
> +++ 2.6.12/include/linux/pnode.h
> @@ -0,0 +1,78 @@
> +/*
> + *  linux/fs/pnode.c
> + *
> + * (C) Copyright IBM Corporation 2005.
> + *	Released under GPL v2.
> + *
> + */
> +#ifndef _LINUX_PNODE_H
> +#define _LINUX_PNODE_H
> +#ifdef __KERNEL__

No need for the above. Kernel headers are not supposed to be included by
userspace.

> +
> +#include <linux/list.h>
> +#include <linux/mount.h>
> +#include <linux/spinlock.h>
> +#include <asm/atomic.h>
> +
> +struct vfspnode {
> +	struct list_head pnode_vfs; 	 /* list of vfsmounts anchored here */
> +	struct list_head pnode_slavevfs; /* list of slave vfsmounts */
> +	struct list_head pnode_slavepnode;/* list of slave pnode */
> +	struct list_head pnode_peer_slave;/* going through master's slave pnode
> +					    list*/
> +	struct vfspnode	 *pnode_master;	  /* master pnode */
> +	int 		 pnode_flags;
> +	atomic_t 	 pnode_count;
> +};
> +#define PNODE_MAX_SLAVE_LEVEL 10
> +#define PNODE_DELETE  0x01
> +#define PNODE_SLAVE   0x02

Enums, please.

> +
> +#define IS_PNODE_DELETE(pn)  ((pn->pnode_flags&PNODE_DELETE)==PNODE_DELETE)
> +#define IS_PNODE_SLAVE(pn)  ((pn->pnode_flags&PNODE_SLAVE)==PNODE_SLAVE)
> +#define SET_PNODE_DELETE(pn)  pn->pnode_flags |= PNODE_DELETE
> +#define SET_PNODE_SLAVE(pn)  pn->pnode_flags |= PNODE_SLAVE

Static inline functions are preferred over #define.

> +
> +extern spinlock_t vfspnode_lock;
> +extern void __put_pnode(struct vfspnode *);
> +
> +static inline struct vfspnode *
> +get_pnode(struct vfspnode *pnode)
> +{
> +	if (!pnode)
> +		return NULL;

Can pnode really be NULL here? Looking at the callers in this patch, it can't.
Please remember that you should do NULL checks like this only when it makes
sense from API point of view to call the function with NULL.

> Index: 2.6.12/include/linux/mount.h
> ===================================================================
> --- 2.6.12.orig/include/linux/mount.h
> +++ 2.6.12/include/linux/mount.h
> @@ -16,9 +16,33 @@
>  #include <linux/spinlock.h>
>  #include <asm/atomic.h>
>  
> -#define MNT_NOSUID	1
> -#define MNT_NODEV	2
> -#define MNT_NOEXEC	4
> +#define MNT_NOSUID	0x01
> +#define MNT_NODEV	0x02
> +#define MNT_NOEXEC	0x04
> +#define MNT_PRIVATE	0x10  /* if the vfsmount is private, by default it is private*/
> +#define MNT_SLAVE	0x20  /* if the vfsmount is a slave mount of its pnode */
> +#define MNT_SHARED	0x40  /* if the vfsmount is a slave mount of its pnode */
> +#define MNT_PNODE_MASK	0xf0  /* propogation flag mask */
> +
> +#define IS_MNT_SHARED(mnt) (mnt->mnt_flags&MNT_SHARED)
> +#define IS_MNT_SLAVE(mnt) (mnt->mnt_flags&MNT_SLAVE)
> +#define IS_MNT_PRIVATE(mnt) (mnt->mnt_flags&MNT_PRIVATE)
> +
> +#define CLEAR_MNT_SHARED(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SHARED))
> +#define CLEAR_MNT_PRIVATE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_PRIVATE))
> +#define CLEAR_MNT_SLAVE(mnt) (mnt->mnt_flags &= ~(MNT_PNODE_MASK & MNT_SLAVE))
> +
> +//TOBEDONE WRITE BETTER MACROS. ..

Please use static inline functions instead.

> +#define SET_MNT_SHARED(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_SHARED),\
> +				CLEAR_MNT_PRIVATE(mnt), \
> +				CLEAR_MNT_SLAVE(mnt))
> +#define SET_MNT_PRIVATE(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_PRIVATE), \
> +				CLEAR_MNT_SLAVE(mnt), \
> +				CLEAR_MNT_SHARED(mnt), \
> +				mnt->mnt_pnode = NULL)
> +#define SET_MNT_SLAVE(mnt) (mnt->mnt_flags |= (MNT_PNODE_MASK & MNT_SLAVE), \
> +				CLEAR_MNT_PRIVATE(mnt), \
> +				CLEAR_MNT_SHARED(mnt))

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 11:17     ` Pekka Enberg
@ 2005-07-08 12:19       ` Roman Zippel
  2005-07-08 12:26         ` Pekka J Enberg
  2005-07-08 16:29       ` [RFC PATCH 1/8] share/private/slave a subtree Ram
  1 sibling, 1 reply; 48+ messages in thread
From: Roman Zippel @ 2005-07-08 12:19 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ram, linux-kernel, linux-fsdevel, viro, Andrew Morton, mike,
	bfields, Miklos Szeredi, Pekka Enberg

Hi,

On Fri, 8 Jul 2005, Pekka Enberg wrote:

> > +#define PNODE_MEMBER_VFS  0x01
> > +#define PNODE_SLAVE_VFS   0x02
> 
> Enums, please.

Is this becoming a requirement now? I personally would rather leave that 
to personal preference...

bye, Roman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree
  2005-07-08 12:19       ` Roman Zippel
@ 2005-07-08 12:26         ` Pekka J Enberg
  2005-07-08 12:46           ` Roman Zippel
  0 siblings, 1 reply; 48+ messages in thread
From: Pekka J Enberg @ 2005-07-08 12:26 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Pekka Enberg, Ram, linux-kernel, linux-fsdevel, viro,
	Andrew Morton, mike, bfields, Miklos Szeredi

On Fri, 8 Jul 2005, Pekka Enberg wrote:
> > > +#define PNODE_MEMBER_VFS  0x01
> > > +#define PNODE_SLAVE_VFS   0x02
> > 
> > Enums, please.

Roman Zippel writes:
> Is this becoming a requirement now? I personally would rather leave that 
> to personal preference...

Hey, I just review patches. I don't get to set requirements. There's a 
reason why enums are preferred though. They define a proper name for the 
constant. It's far to easy to mess up with #defines. They also document the 
code intent much better as you can group related constants together. 

                 Pekka

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree
  2005-07-08 12:26         ` Pekka J Enberg
@ 2005-07-08 12:46           ` Roman Zippel
  2005-07-08 12:58             ` Pekka J Enberg
  0 siblings, 1 reply; 48+ messages in thread
From: Roman Zippel @ 2005-07-08 12:46 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Ram, linux-kernel, linux-fsdevel, Alexander Viro, Andrew Morton,
	mike, bfields, Miklos Szeredi

Hi,

On Fri, 8 Jul 2005, Pekka J Enberg wrote:

> On Fri, 8 Jul 2005, Pekka Enberg wrote:
> > > > +#define PNODE_MEMBER_VFS  0x01
> > > > +#define PNODE_SLAVE_VFS   0x02
> > > > Enums, please.
> 
> Roman Zippel writes:
> > Is this becoming a requirement now? I personally would rather leave that to
> > personal preference...
> 
> Hey, I just review patches. I don't get to set requirements. There's a reason
> why enums are preferred though. They define a proper name for the constant.

Who prefers that?

> It's far to easy to mess up with #defines.

Rather unlikely with such simple masks.

> They also document the code intent
> much better as you can group related constants together. 

You can't do that with defines?

bye, Roman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree
  2005-07-08 12:46           ` Roman Zippel
@ 2005-07-08 12:58             ` Pekka J Enberg
  2005-07-08 13:34               ` Roman Zippel
  0 siblings, 1 reply; 48+ messages in thread
From: Pekka J Enberg @ 2005-07-08 12:58 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ram, linux-kernel, linux-fsdevel, Alexander Viro, Andrew Morton,
	mike, bfields, Miklos Szeredi

On Fri, 8 Jul 2005, Pekka J Enberg wrote:
> > Hey, I just review patches. I don't get to set requirements. There's a reason
> > why enums are preferred though. They define a proper name for the constant.

Roman Zippel writes:
> Who prefers that?

Well, me, at least. I can't speak for others. 

On Fri, 8 Jul 2005, Pekka J Enberg wrote:
> > It's far to easy to mess up with #defines.

Roman Zippel writes:
> Rather unlikely with such simple masks.

Redefining a constant with #define by an accident is easy. Introducing 
duplicate constants is equally easy (see radeon headers for an example). 

On Fri, 8 Jul 2005, Pekka J Enberg wrote:
> > They also document the code intent
> > much better as you can group related constants together.

Roman Zippel writes:
> You can't do that with defines?

Sure you can but have you ever tried to figure out where a group of #define 
enumerations end? Enums are a natural language construct for grouping 
related constants so why not use it? 

Bottom line, there are few advantages to using enums rather than #defines 
which is why they are IMHO preferred for new code. 

                    Pekka 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree
  2005-07-08 12:58             ` Pekka J Enberg
@ 2005-07-08 13:34               ` Roman Zippel
  2005-07-08 16:17                 ` Pekka Enberg
  2005-07-08 16:33                 ` share/private/slave a subtree - define vs enum Bryan Henderson
  0 siblings, 2 replies; 48+ messages in thread
From: Roman Zippel @ 2005-07-08 13:34 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Ram, linux-kernel, linux-fsdevel, Alexander Viro, Andrew Morton,
	mike, bfields, Miklos Szeredi

Hi,

On Fri, 8 Jul 2005, Pekka J Enberg wrote:

> > You can't do that with defines?
> 
> Sure you can but have you ever tried to figure out where a group of #define
> enumerations end?

Comments? Newlines?

> Enums are a natural language construct for grouping related
> constants so why not use it? 

So are defines.

> Bottom line, there are few advantages to using enums rather than #defines
> which is why they are IMHO preferred for new code. 

Are the advantages big enough to actively discourage defines? It's nice 
that you do reviews, but please leave some room for personal preferences. 
If the code is correct and perfectly readable, it doesn't matter whether 
to use defines or enums. Unless you also intent to also debug and work 
with that code, why don't leave the decision to the author?

bye, Roman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 10:25   ` [RFC PATCH 1/8] share/private/slave a subtree Ram
  2005-07-08 11:17     ` Pekka Enberg
@ 2005-07-08 14:32     ` Miklos Szeredi
  2005-07-08 16:19       ` Ram
  1 sibling, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2005-07-08 14:32 UTC (permalink / raw)
  To: linuxram; +Cc: linux-kernel, linux-fsdevel, viro, akpm, mike, bfields

> This patch adds the shared/private/slave support for VFS trees.

[...]

> -struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
> +struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry, struct dentry *root)
>  {

How about changing it to inline and calling it __lookup_mnt_root(),
and calling it from lookup_mnt() (which could keep the old signature)
and lookup_mnt_root().  That way the compiler can optimize away the
root check for the plain lookup_mnt() case, and there's no need to
modify callers of lookup_mnt().

[...]

>  
> +struct vfsmount *do_make_mounted(struct vfsmount *mnt, struct dentry *dentry)
> +{

What does this function do?  Can we have a header comment?

> +int
> +_do_make_mounted(struct nameidata *nd, struct vfsmount **mnt)
> +{

Ditto.

> +/*
> + * recursively change the type of the mountpoint.
> + */
> +static int do_change_type(struct nameidata *nd, int flag)
> +{
> +	struct vfsmount *m, *mnt;
> +	struct vfspnode *old_pnode = NULL;
> +	int err;
> +
> +	if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
> +			&& !(flag & MS_SLAVE))
> +		return -EINVAL;
> +
> +	if ((err = _do_make_mounted(nd, &mnt)))
> +		return err;


Why does this opertation do any mounting?  If it's a type change, it
should just change the type of something already mounted, no?

> +		case MS_SHARED:
> +			/*
> +			 * if the mount is already a slave mount,
> +			 * allocated a new pnode and make it
> +			 * a slave pnode of the original pnode.
> +			 */
> +			if (IS_MNT_SLAVE(m)) {
> +				old_pnode = m->mnt_pnode;
> +				pnode_del_slave_mnt(m);
> +			}
> +			if(!IS_MNT_SHARED(m)) {
> +				m->mnt_pnode = pnode_alloc();
> +				if(!m->mnt_pnode) {
> +					pnode_add_slave_mnt(old_pnode, m);
> +					err = -ENOMEM;
> +					break;
> +				}
> +				pnode_add_member_mnt(m->mnt_pnode, m);
> +			}
> +			if(old_pnode) {
> +				pnode_add_slave_pnode(old_pnode,
> +						m->mnt_pnode);
> +			}
> +			SET_MNT_SHARED(m);
> +			break;
> +
> +		case MS_SLAVE:
> +			if (IS_MNT_SLAVE(m)) {
> +				break;
> +			}
> +			/*
> +			 * only shared mounts can
> +			 * be made slave
> +			 */
> +			if (!IS_MNT_SHARED(m)) {
> +				err = -EINVAL;
> +				break;
> +			}
> +			old_pnode = m->mnt_pnode;
> +			pnode_del_member_mnt(m);
> +			pnode_add_slave_mnt(old_pnode, m);
> +			SET_MNT_SLAVE(m);
> +			break;
> +
> +		case MS_PRIVATE:
> +			if(m->mnt_pnode)
> +				pnode_disassociate_mnt(m);
> +			SET_MNT_PRIVATE(m);
> +			break;
> +

Can this be split into three functions?

[...]

> +/*
> + * Walk the pnode tree for each pnode encountered.  A given pnode in the tree
> + * can be returned a minimum of 2 times.  First time the pnode is encountered,
> + * it is returned with the flag PNODE_DOWN. Everytime the pnode is encountered
> + * after having traversed through each of its children, it is returned with the
> + * flag PNODE_MID.  And finally when the pnode is encountered after having
> + * walked all of its children, it is returned with the flag PNODE_UP.
> + *
> + * @context: provides context on the state of the last walk in the pnode
> + * 		tree.
> + */
> +static int inline
> +pnode_next(struct pcontext *context)
> +{

Is such a generic traversal function really needed?  Why?

[...]

> +struct vfsmount *
> +pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
> +{

Again a header comment would be nice, on what exactly this function
does.  Also the implementation is really cryptic, but I can't even
start to decipher without knowing what it's supposed to do.

[...]

> +static inline struct vfspnode *
> +get_pnode_n(struct vfspnode *pnode, size_t n)
> +{

Seems to be unused throughout the patch series

> +	struct list_head mnt_pnode_mntlist;/* and going through their
> +					   pnode's vfsmount */
> +	struct vfspnode *mnt_pnode;	/* and going through their
> +					   pnode's vfsmount */
>  	atomic_t mnt_count;
>  	int mnt_flags;
>  	int mnt_expiry_mark;		/* true if marked for expiry */
> @@ -38,6 +66,7 @@ struct vfsmount
>  	struct namespace *mnt_namespace; /* containing namespace */
>  };
>  
> +
>  static inline struct vfsmount *mntget(struct vfsmount *mnt)

Please don't add empty lines.

Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree
  2005-07-08 13:34               ` Roman Zippel
@ 2005-07-08 16:17                 ` Pekka Enberg
  2005-07-08 16:33                 ` share/private/slave a subtree - define vs enum Bryan Henderson
  1 sibling, 0 replies; 48+ messages in thread
From: Pekka Enberg @ 2005-07-08 16:17 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ram, linux-kernel, linux-fsdevel, Alexander Viro, Andrew Morton,
	mike, bfields, Miklos Szeredi

On Fri, 2005-07-08 at 15:34 +0200, Roman Zippel wrote:
> Are the advantages big enough to actively discourage defines? It's nice 
> that you do reviews, but please leave some room for personal preferences. 
> If the code is correct and perfectly readable, it doesn't matter whether 
> to use defines or enums. Unless you also intent to also debug and work 
> with that code, why don't leave the decision to the author?

I think the advantages are big enough. Also, in my experience, it is
usually not a conscious decision by the author. But if you and other
developers think my enum pushing is too much, I can tone it down :).

			Pekka


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 14:32     ` Miklos Szeredi
@ 2005-07-08 16:19       ` Ram
  2005-07-08 16:51         ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Ram @ 2005-07-08 16:19 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-kernel, linux-fsdevel, viro, Andrew Morton, mike, bfields

On Fri, 2005-07-08 at 07:32, Miklos Szeredi wrote:
> > This patch adds the shared/private/slave support for VFS trees.
> 
> [...]
> 
> > -struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry)
> > +struct vfsmount *lookup_mnt(struct vfsmount *mnt, struct dentry *dentry, struct dentry *root)
> >  {
> 
> How about changing it to inline and calling it __lookup_mnt_root(),
> and calling it from lookup_mnt() (which could keep the old signature)
> and lookup_mnt_root().  That way the compiler can optimize away the
> root check for the plain lookup_mnt() case, and there's no need to
> modify callers of lookup_mnt().
> 
> [...]


ok. 

> 
> >  
> > +struct vfsmount *do_make_mounted(struct vfsmount *mnt, struct dentry *dentry)
> > +{
> 
> What does this function do?  Can we have a header comment?

This function does the equivalent of 'mount --bind dir dir'
It  creates a new child vfsmount at that dentry, and moves
the children vfsmounts below that dentry, under the newly created child
vfsmount.  There is a header comment for that function. But it got
into the 2nd patch.

> 
> > +int
> > +_do_make_mounted(struct nameidata *nd, struct vfsmount **mnt)
> > +{
> 
> Ditto.

This function returns immeditely if a vfsmount already exists at the
dentry. Otherwise it creates a vfsmount at the specified dentry, and if
the dentry belongs to shared vfsmount it ensures the same
operation is done on all peer vfsmounts to which propogation
is set.


> 
> > +/*
> > + * recursively change the type of the mountpoint.
> > + */
> > +static int do_change_type(struct nameidata *nd, int flag)
> > +{
> > +	struct vfsmount *m, *mnt;
> > +	struct vfspnode *old_pnode = NULL;
> > +	int err;
> > +
> > +	if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
> > +			&& !(flag & MS_SLAVE))
> > +		return -EINVAL;
> > +
> > +	if ((err = _do_make_mounted(nd, &mnt)))
> > +		return err;
> 
> 
> Why does this opertation do any mounting?  If it's a type change, it
> should just change the type of something already mounted, no?

apart from changing types, it has to create a new vfsmount if one
does not exist at that point. 

> 
> > +		case MS_SHARED:
> > +			/*
> > +			 * if the mount is already a slave mount,
> > +			 * allocated a new pnode and make it
> > +			 * a slave pnode of the original pnode.
> > +			 */
> > +			if (IS_MNT_SLAVE(m)) {
> > +				old_pnode = m->mnt_pnode;
> > +				pnode_del_slave_mnt(m);
> > +			}
> > +			if(!IS_MNT_SHARED(m)) {
> > +				m->mnt_pnode = pnode_alloc();
> > +				if(!m->mnt_pnode) {
> > +					pnode_add_slave_mnt(old_pnode, m);
> > +					err = -ENOMEM;
> > +					break;
> > +				}
> > +				pnode_add_member_mnt(m->mnt_pnode, m);
> > +			}
> > +			if(old_pnode) {
> > +				pnode_add_slave_pnode(old_pnode,
> > +						m->mnt_pnode);
> > +			}
> > +			SET_MNT_SHARED(m);
> > +			break;
> > +
> > +		case MS_SLAVE:
> > +			if (IS_MNT_SLAVE(m)) {
> > +				break;
> > +			}
> > +			/*
> > +			 * only shared mounts can
> > +			 * be made slave
> > +			 */
> > +			if (!IS_MNT_SHARED(m)) {
> > +				err = -EINVAL;
> > +				break;
> > +			}
> > +			old_pnode = m->mnt_pnode;
> > +			pnode_del_member_mnt(m);
> > +			pnode_add_slave_mnt(old_pnode, m);
> > +			SET_MNT_SLAVE(m);
> > +			break;
> > +
> > +		case MS_PRIVATE:
> > +			if(m->mnt_pnode)
> > +				pnode_disassociate_mnt(m);
> > +			SET_MNT_PRIVATE(m);
> > +			break;
> > +
> 
> Can this be split into three functions?

yes. will do.

> 
> [...]
> 
> > +/*
> > + * Walk the pnode tree for each pnode encountered.  A given pnode in the tree
> > + * can be returned a minimum of 2 times.  First time the pnode is encountered,
> > + * it is returned with the flag PNODE_DOWN. Everytime the pnode is encountered
> > + * after having traversed through each of its children, it is returned with the
> > + * flag PNODE_MID.  And finally when the pnode is encountered after having
> > + * walked all of its children, it is returned with the flag PNODE_UP.
> > + *
> > + * @context: provides context on the state of the last walk in the pnode
> > + * 		tree.
> > + */
> > +static int inline
> > +pnode_next(struct pcontext *context)
> > +{
> 
> Is such a generic traversal function really needed?  Why?

Yes. I found it useful to write generic code without having to worry
about the details of the traversal being duplicated everywhere.  Do you
have better suggestion? This function is the equivalent of next_mnt()
for traversing pnode trees.



> 
> [...]
> 
> > +struct vfsmount *
> > +pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
> > +{
> 
> Again a header comment would be nice, on what exactly this function
> does.  Also the implementation is really cryptic, but I can't even
> start to decipher without knowing what it's supposed to do.

yes. this function takes care of traversing the propogation tree and
creating a child vfsmount for dentries in each vfsmount encountered.
In other words it calls do_make_mounted() for each vfsmount that belongs
to the propogation tree.


> 
> [...]
> 
> > +static inline struct vfspnode *
> > +get_pnode_n(struct vfspnode *pnode, size_t n)
> > +{
> 
> Seems to be unused throughout the patch series

Yes. will delete it. Initially thought I would need it.


> > +	struct list_head mnt_pnode_mntlist;/* and going through their
> > +					   pnode's vfsmount */
> > +	struct vfspnode *mnt_pnode;	/* and going through their
> > +					   pnode's vfsmount */
> >  	atomic_t mnt_count;
> >  	int mnt_flags;
> >  	int mnt_expiry_mark;		/* true if marked for expiry */
> > @@ -38,6 +66,7 @@ struct vfsmount
> >  	struct namespace *mnt_namespace; /* containing namespace */
> >  };
> >  
> > +
> >  static inline struct vfsmount *mntget(struct vfsmount *mnt)
> 
> Please don't add empty lines.

ok.

RP
> 
> Miklos


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 11:17     ` Pekka Enberg
  2005-07-08 12:19       ` Roman Zippel
@ 2005-07-08 16:29       ` Ram
  1 sibling, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 16:29 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, linux-fsdevel, viro, Andrew Morton, mike, bfields,
	Miklos Szeredi, Pekka Enberg

On Fri, 2005-07-08 at 04:17, Pekka Enberg wrote:
> On 7/8/05, Ram <linuxram@us.ibm.com> wrote:
> > This patch adds the shared/private/slave support for VFS trees.
> 
> Inlining the patches to email would be greatly appreciated. Here are
> some comments.
> 
> > +int
> > +_do_make_mounted(struct nameidata *nd, struct vfsmount **mnt)
> 
> Use two underscores to follow naming conventions.

I have renamed this function as make_mounted() in the 2nd patch.
Sure, will follow the convention.

> 
> > Index: 2.6.12/fs/pnode.c
> > ===================================================================
> > --- /dev/null
> > +++ 2.6.12/fs/pnode.c
> > @@ -0,0 +1,362 @@
> > +
> > +#define PNODE_MEMBER_VFS  0x01
> > +#define PNODE_SLAVE_VFS   0x02
> 
> Enums, please.

ok.

> 
> > +
> > +static kmem_cache_t * pnode_cachep;
> > +
> > +/* spinlock for pnode related operations */
> > + __cacheline_aligned_in_smp DEFINE_SPINLOCK(vfspnode_lock);
> > +
> > +
> > +static void
> > +pnode_init_fn(void *data, kmem_cache_t *cachep, unsigned long flags)
> > +{
> > +	struct vfspnode *pnode = (struct vfspnode *)data;
> 
> Redundant cast.

ok.

> 
> > +	INIT_LIST_HEAD(&pnode->pnode_vfs);
> > +	INIT_LIST_HEAD(&pnode->pnode_slavevfs);
> > +	INIT_LIST_HEAD(&pnode->pnode_slavepnode);
> > +	INIT_LIST_HEAD(&pnode->pnode_peer_slave);
> > +	pnode->pnode_master = NULL;
> > +	pnode->pnode_flags = 0;
> > +	atomic_set(&pnode->pnode_count,0);
> > +}
> > +
> > +void __init
> > +pnode_init(unsigned long mempages)
> > +{
> > +	pnode_cachep = kmem_cache_create("pnode_cache",
> > +                       sizeof(struct vfspnode), 0,
> > +                       SLAB_HWCACHE_ALIGN|SLAB_PANIC, pnode_init_fn, NULL);
> > +}
> > +
> > +
> > +struct vfspnode *
> > +pnode_alloc(void)
> > +{
> > +	struct vfspnode *pnode =  (struct vfspnode *)kmem_cache_alloc(
> > +			pnode_cachep, GFP_KERNEL);
> 
> Redundant cast.

ok.

> 
> > +struct inoutdata {
> 
> Wants a better name.

ok. something like propogation_data? or pdata?

> 
> > +	void *my_data; /* produced and consumed by me */
> > +	void *in_data; /* produced by master, consumed by slave */
> > +	void *out_data; /* produced by slave, comsume by master */
> > +};
> > +
> > +struct pcontext {
> > +	struct vfspnode *start;
> > +	int 	flag;
> > +	int 	traversal;
> > +	int	level;
> > +	struct vfspnode *master_pnode;
> > +	struct vfspnode *pnode;
> > +	struct vfspnode *slave_pnode;
> > +};
> > +
> > +
> > +#define PNODE_UP 1
> > +#define PNODE_DOWN 2
> > +#define PNODE_MID 3
> 
> Enums, please.

ok. I will incorporate all the rest of the comments.  There are lots of
places as noted by you I need to following the coding style. I will.

Thanks,
RP
 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 13:34               ` Roman Zippel
  2005-07-08 16:17                 ` Pekka Enberg
@ 2005-07-08 16:33                 ` Bryan Henderson
  2005-07-08 16:57                   ` Roman Zippel
  2005-07-08 18:03                   ` Wichert Akkerman
  1 sibling, 2 replies; 48+ messages in thread
From: Bryan Henderson @ 2005-07-08 16:33 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Andrew Morton, bfields, linux-fsdevel, linux-kernel, linuxram,
	mike, Miklos Szeredi, Pekka J Enberg, Alexander Viro

I wasn't aware anyone preferred defines to enums for declaring enumerated 
data types.  The practical advantages of enums are slight, but as far as I 
know, the practical advantages of defines are zero.  Isn't the only 
argument for defines, "that's what I'm used to."?

Two advantages of the enum declaration that haven't been mentioned yet, 
that help me significantly:

- if you have a typo in a define, it can be really hard to interpret the 
compiler error messages.  The same typo in an enum gets a pointed error 
message referring to the line that has the typo.

- Gcc warns you if a switch statement doesn't handle every case.  I often 
add an enumeration and Gcc lets me know where I forgot to consider it.

The macro language is one the most hated parts of the C language; it makes 
sense to try to avoid it as a general rule.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 16:19       ` Ram
@ 2005-07-08 16:51         ` Miklos Szeredi
  2005-07-08 17:52           ` Ram
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2005-07-08 16:51 UTC (permalink / raw)
  To: linuxram; +Cc: linux-kernel, linux-fsdevel, viro, akpm, mike, bfields

> > > + * recursively change the type of the mountpoint.
> > > + */
> > > +static int do_change_type(struct nameidata *nd, int flag)
> > > +{
> > > +	struct vfsmount *m, *mnt;
> > > +	struct vfspnode *old_pnode = NULL;
> > > +	int err;
> > > +
> > > +	if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
> > > +			&& !(flag & MS_SLAVE))
> > > +		return -EINVAL;
> > > +
> > > +	if ((err = _do_make_mounted(nd, &mnt)))
> > > +		return err;
> > 
> > 
> > Why does this opertation do any mounting?  If it's a type change, it
> > should just change the type of something already mounted, no?
> 
> apart from changing types, it has to create a new vfsmount if one
> does not exist at that point. 

Why?

I think it would be more logical to either

  - return -EINVAL if it's not a mountpoint

  - change the type of the mount even if it's not a mountpoint

Is there some reason the user can't do the 'bind dir dir' manually if
needed?


> > > +/*
> > > + * Walk the pnode tree for each pnode encountered.  A given pnode in the tree
> > > + * can be returned a minimum of 2 times.  First time the pnode is encountered,
> > > + * it is returned with the flag PNODE_DOWN. Everytime the pnode is encountered
> > > + * after having traversed through each of its children, it is returned with the
> > > + * flag PNODE_MID.  And finally when the pnode is encountered after having
> > > + * walked all of its children, it is returned with the flag PNODE_UP.
> > > + *
> > > + * @context: provides context on the state of the last walk in the pnode
> > > + * 		tree.
> > > + */
> > > +static int inline
> > > +pnode_next(struct pcontext *context)
> > > +{
> > 
> > Is such a generic traversal function really needed?  Why?
> 
> Yes. I found it useful to write generic code without having to worry
> about the details of the traversal being duplicated everywhere.  Do you
> have better suggestion? This function is the equivalent of next_mnt()
> for traversing pnode trees.

I'm just wondering, why do you need to return 2/3 times per node.  Are
all these traversal points needed by propagation operations?  Why?

Some notes (maybe outside the code) explaining the mechanism of the
propagations would be nice.  Without these it's hard to understand the
design decisions behind such an implementation.

> > 
> > > +struct vfsmount *
> > > +pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
> > > +{
> > 
> > Again a header comment would be nice, on what exactly this function
> > does.  Also the implementation is really cryptic, but I can't even
> > start to decipher without knowing what it's supposed to do.
> 
> yes. this function takes care of traversing the propogation tree and
> creating a child vfsmount for dentries in each vfsmount encountered.
> In other words it calls do_make_mounted() for each vfsmount that belongs
> to the propogation tree.

So it just does the propagated 'bind dir dir'?

Why not use the generic propagated bind routine (defined in a later
patch I presume) for this? 

Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 16:33                 ` share/private/slave a subtree - define vs enum Bryan Henderson
@ 2005-07-08 16:57                   ` Roman Zippel
  2005-07-08 17:16                     ` Bryan Henderson
  2005-07-10 10:55                     ` Denis Vlasenko
  2005-07-08 18:03                   ` Wichert Akkerman
  1 sibling, 2 replies; 48+ messages in thread
From: Roman Zippel @ 2005-07-08 16:57 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: Andrew Morton, bfields, linux-fsdevel, linux-kernel, linuxram,
	mike, Miklos Szeredi, Pekka J Enberg, Alexander Viro

Hi,

On Fri, 8 Jul 2005, Bryan Henderson wrote:

> I wasn't aware anyone preferred defines to enums for declaring enumerated 
> data types.

If it's really enumerated data types, that's fine, but this example was 
about bitfield masks.

> Isn't the only argument for defines, "that's what I'm used to."?

defines are not just used for constants and there is _nothing_ wrong with 
using defines for constants.

> The macro language is one the most hated parts of the C language; it makes 
> sense to try to avoid it as a general rule.

Nevertheless it's part of the language, it's used all over the kernel and 
suddenly starting to mix different types of definitions, makes things 
only worse. I prefer consistency here over any minor advantages enums 
might have.

bye, Roman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 16:57                   ` Roman Zippel
@ 2005-07-08 17:16                     ` Bryan Henderson
  2005-07-08 18:21                       ` Pekka J Enberg
  2005-07-10 10:55                     ` Denis Vlasenko
  1 sibling, 1 reply; 48+ messages in thread
From: Bryan Henderson @ 2005-07-08 17:16 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Andrew Morton, bfields, linux-fsdevel, linux-kernel, linuxram,
	mike, Miklos Szeredi, Pekka J Enberg, Alexander Viro

>If it's really enumerated data types, that's fine, but this example was 
>about bitfield masks.

Ah.  In that case, enum is a pretty tortured way to declare it, though it 
does have the practical advantages over define that have been mentioned 
because the syntax is more rigorous.

The proper way to do bitfield masks is usually C bit field declarations, 
but I understand that tradition works even more strongly against using 
those than against using enum to declare enumerated types.

>there is _nothing_ wrong with using defines for constants.

I disagree with that; I find practical and, more importantly, 
philosophical reasons not to use defines for constants.  I'm sure you've 
heard the arguments; I just didn't want to let that statement go 
uncontested.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 16:51         ` Miklos Szeredi
@ 2005-07-08 17:52           ` Ram
  2005-07-08 19:49             ` Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Ram @ 2005-07-08 17:52 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-kernel, linux-fsdevel, viro, Andrew Morton, mike, bfields

On Fri, 2005-07-08 at 09:51, Miklos Szeredi wrote:
> > > > + * recursively change the type of the mountpoint.
> > > > + */
> > > > +static int do_change_type(struct nameidata *nd, int flag)
> > > > +{
> > > > +	struct vfsmount *m, *mnt;
> > > > +	struct vfspnode *old_pnode = NULL;
> > > > +	int err;
> > > > +
> > > > +	if (!(flag & MS_SHARED) && !(flag & MS_PRIVATE)
> > > > +			&& !(flag & MS_SLAVE))
> > > > +		return -EINVAL;
> > > > +
> > > > +	if ((err = _do_make_mounted(nd, &mnt)))
> > > > +		return err;
> > > 
> > > 
> > > Why does this opertation do any mounting?  If it's a type change, it
> > > should just change the type of something already mounted, no?
> > 
> > apart from changing types, it has to create a new vfsmount if one
> > does not exist at that point. 
> 
> Why?
> 
> I think it would be more logical to either
> 
>   - return -EINVAL if it's not a mountpoint
> 
>   - change the type of the mount even if it's not a mountpoint
> 
> Is there some reason the user can't do the 'bind dir dir' manually if
> needed?

The reason why I implemented that way, is to less confuse the user and
provide more flexibility. With my implementation, we have the ability
to share any part of the tree, without bothering if it is a mountpoint
or not. The side effect of this operation is, it ends up creating 
a vfsmount if the dentry is not a mountpoint.

so when a user says
      mount --make-shared /tmp/abc
the tree under /tmp/abc becomes shared. 
With your suggestion either the user will get -EINVAL or the tree
under / will become shared. The second behavior will be really
confusing. I am ok with -EINVAL. 


Also there is another reason why I used this behavior. Lets say /mnt
is a mountpoint and Say a user does
		mount make-shared /mnt 

and then does 
                mount --bind /mnt/abc  /mnt1

NOTE: we need propogation to be set up between /mnt/abc and /mnt1 and
propogation can only be set up for vfsmounts.  In this case /mnt/abc 
is not a mountpoint. I have two choices, either return -EINVAL
or create a vfsmount at that point. But -EINVAL is not consistent
with standard --bind behavior. So I chose the later behavior.

Now that we anyway need this behavior while doing bind mounts from
shared trees, I kept the same behavior for --make-shared.


> 
> 
> > > > +/*
> > > > + * Walk the pnode tree for each pnode encountered.  A given pnode in the tree
> > > > + * can be returned a minimum of 2 times.  First time the pnode is encountered,
> > > > + * it is returned with the flag PNODE_DOWN. Everytime the pnode is encountered
> > > > + * after having traversed through each of its children, it is returned with the
> > > > + * flag PNODE_MID.  And finally when the pnode is encountered after having
> > > > + * walked all of its children, it is returned with the flag PNODE_UP.
> > > > + *
> > > > + * @context: provides context on the state of the last walk in the pnode
> > > > + * 		tree.
> > > > + */
> > > > +static int inline
> > > > +pnode_next(struct pcontext *context)
> > > > +{
> > > 
> > > Is such a generic traversal function really needed?  Why?
> > 
> > Yes. I found it useful to write generic code without having to worry
> > about the details of the traversal being duplicated everywhere.  Do you
> > have better suggestion? This function is the equivalent of next_mnt()
> > for traversing pnode trees.
> 
> I'm just wondering, why do you need to return 2/3 times per node.  Are
> all these traversal points needed by propagation operations?  Why?

yes. It becomes highly necessary when we do mounts in shared
vfsmounts.  The mount has to be propogated to all the slave pnodes
as well as the member and slave mounts.  And in the processs of
propogation a new pnode propogation tree has to be created that
sets up the propogation for all the new child mounts.

The construction of the new child propogation tree during the process of
traversal of the parent's propogation tree, necessitates the need
for encountering the same pnode multiple times in different contexts.

Moreover I tried to abstract the propogation tree traversal as much
as possible so that all kinds of mount operations just have to
concentrate on the core operation instead of the details of the
traversal. pnode_opt.patch has most of these abstraction. 


> Some notes (maybe outside the code) explaining the mechanism of the
> propagations would be nice.  Without these it's hard to understand the
> design decisions behind such an implementation.

Ok. I will make a small writeup on the mechanism.


> 
> > > 
> > > > +struct vfsmount *
> > > > +pnode_make_mounted(struct vfspnode *pnode, struct vfsmount *mnt, struct dentry *dentry)
> > > > +{
> > > 
> > > Again a header comment would be nice, on what exactly this function
> > > does.  Also the implementation is really cryptic, but I can't even
> > > start to decipher without knowing what it's supposed to do.
> > 
> > yes. this function takes care of traversing the propogation tree and
> > creating a child vfsmount for dentries in each vfsmount encountered.
> > In other words it calls do_make_mounted() for each vfsmount that belongs
> > to the propogation tree.
> 
> So it just does the propagated 'bind dir dir'?

yes.
> 
> Why not use the generic propagated bind routine (defined in a later
> patch I presume) for this? 

It does that by calling the generic tree traversal routine(in the
pnode_opt.patch) . The tree traversal is taken care by pnode_traverse()
and the core functionality of calling do_make_mounted() is through
this mechanism.

RP


> Miklos


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 16:33                 ` share/private/slave a subtree - define vs enum Bryan Henderson
  2005-07-08 16:57                   ` Roman Zippel
@ 2005-07-08 18:03                   ` Wichert Akkerman
  2005-07-08 18:10                     ` Mike Waychison
  1 sibling, 1 reply; 48+ messages in thread
From: Wichert Akkerman @ 2005-07-08 18:03 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: Roman Zippel, Andrew Morton, bfields, linux-fsdevel,
	linux-kernel, linuxram, mike, Miklos Szeredi, Pekka J Enberg,
	Alexander Viro

Previously Bryan Henderson wrote:
> Two advantages of the enum declaration that haven't been mentioned yet, 
> that help me significantly:

There is another one: with enums the compiler checks types for you.

Wichert.

-- 
Wichert Akkerman <wichert@wiggy.net>    It is simple to make things.
http://www.wiggy.net/                   It is hard to make things simple.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 18:03                   ` Wichert Akkerman
@ 2005-07-08 18:10                     ` Mike Waychison
  2005-07-08 18:15                       ` Wichert Akkerman
  0 siblings, 1 reply; 48+ messages in thread
From: Mike Waychison @ 2005-07-08 18:10 UTC (permalink / raw)
  To: Wichert Akkerman
  Cc: Bryan Henderson, Roman Zippel, Andrew Morton, bfields,
	linux-fsdevel, linux-kernel, linuxram, Miklos Szeredi,
	Pekka J Enberg, Alexander Viro

Wichert Akkerman wrote:
> Previously Bryan Henderson wrote:
> 
>>Two advantages of the enum declaration that haven't been mentioned yet, 
>>that help me significantly:
> 
> 
> There is another one: with enums the compiler checks types for you.
> 

enums in C are (de?)promoted to integral types under most conditions, so 
the type-checking is useless.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 18:10                     ` Mike Waychison
@ 2005-07-08 18:15                       ` Wichert Akkerman
  2005-07-08 20:23                         ` Mike Waychison
  0 siblings, 1 reply; 48+ messages in thread
From: Wichert Akkerman @ 2005-07-08 18:15 UTC (permalink / raw)
  To: Mike Waychison
  Cc: Bryan Henderson, Roman Zippel, Andrew Morton, bfields,
	linux-fsdevel, linux-kernel, linuxram, Miklos Szeredi,
	Pekka J Enberg, Alexander Viro

Previously Mike Waychison wrote:
> enums in C are (de?)promoted to integral types under most conditions, so 
> the type-checking is useless.

It's a warning in gcc afaik and spare should complain as well.

Wichert.

-- 
Wichert Akkerman <wichert@wiggy.net>    It is simple to make things.
http://www.wiggy.net/                   It is hard to make things simple.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 17:16                     ` Bryan Henderson
@ 2005-07-08 18:21                       ` Pekka J Enberg
  2005-07-08 19:11                         ` Roman Zippel
  2005-07-08 22:12                         ` Bryan Henderson
  0 siblings, 2 replies; 48+ messages in thread
From: Pekka J Enberg @ 2005-07-08 18:21 UTC (permalink / raw)
  To: Bryan Henderson
  Cc: Roman Zippel, Andrew Morton, bfields, linux-fsdevel,
	linux-kernel, linuxram, mike, Miklos Szeredi, Alexander Viro

Roman Zippel writes:
> > If it's really enumerated data types, that's fine, but this example was 
> > about bitfield masks.

Bryan Henderson writes:
> Ah.  In that case, enum is a pretty tortured way to declare it, though it 
> does have the practical advantages over define that have been mentioned 
> because the syntax is more rigorous.

I think the bit we're talking about is this: 

> Index: 2.6.12/fs/pnode.c
> ===================================================================
> --- /dev/null
> +++ 2.6.12/fs/pnode.c
> @@ -0,0 +1,362 @@
> +
> +#define PNODE_MEMBER_VFS  0x01
> +#define PNODE_SLAVE_VFS   0x02

I don't see how the following is tortured: 

enum {
       PNODE_MEMBER_VFS  = 0x01,
       PNODE_SLAVE_VFS   = 0x02
}; 

In fact, I think it is more natural. An almost identical example even 
appears in K&R. 

                           Pekka 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 18:21                       ` Pekka J Enberg
@ 2005-07-08 19:11                         ` Roman Zippel
  2005-07-08 19:33                           ` Pekka Enberg
  2005-07-08 19:38                           ` Ram
  2005-07-08 22:12                         ` Bryan Henderson
  1 sibling, 2 replies; 48+ messages in thread
From: Roman Zippel @ 2005-07-08 19:11 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Bryan Henderson, Andrew Morton, bfields, linux-fsdevel,
	linux-kernel, linuxram, mike, Miklos Szeredi, Alexander Viro

Hi,

On Fri, 8 Jul 2005, Pekka J Enberg wrote:

> I don't see how the following is tortured: 
> enum {
>       PNODE_MEMBER_VFS  = 0x01,
>       PNODE_SLAVE_VFS   = 0x02
> }; 
> In fact, I think it is more natural. An almost identical example even appears
> in K&R. 

So it basically comes down to personal preference, if the original uses 
defines and it works fine, I don't really see a good enough reason to 
change it to enums, so please leave the decision to author.

bye, Roman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 19:11                         ` Roman Zippel
@ 2005-07-08 19:33                           ` Pekka Enberg
  2005-07-08 19:59                             ` Roman Zippel
  2005-07-08 19:38                           ` Ram
  1 sibling, 1 reply; 48+ messages in thread
From: Pekka Enberg @ 2005-07-08 19:33 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Bryan Henderson, Andrew Morton, bfields, linux-fsdevel,
	linux-kernel, linuxram, mike, Miklos Szeredi, Alexander Viro

Hi,

On Fri, 2005-07-08 at 21:11 +0200, Roman Zippel wrote:
> So it basically comes down to personal preference, if the original uses 
> defines and it works fine, I don't really see a good enough reason to 
> change it to enums, so please leave the decision to author.

(And I don't see a good enough reason to use #defines when you don't
 absolutely have to. This is what we disagree on.)

Roman, it is not as if I get to decide for the patch submitters. I
comment on any issues _I_ have with the patch and the authors fix
whatever they want (or what the maintainers ask for).

As I disagree with the part about enums being a personal preference, I
will continue to comment on them in the future. If patch authors wish to
ignore them (or any of my comments for that matter), that's ok with me.

P.S. Working code is not enough for the kernel. It must be maintainable
as well.

			Pekka


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 19:11                         ` Roman Zippel
  2005-07-08 19:33                           ` Pekka Enberg
@ 2005-07-08 19:38                           ` Ram
  1 sibling, 0 replies; 48+ messages in thread
From: Ram @ 2005-07-08 19:38 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Pekka J Enberg, Bryan Henderson, Andrew Morton, bfields,
	linux-fsdevel, linux-kernel, mike, Miklos Szeredi,
	Alexander Viro

On Fri, 2005-07-08 at 12:11, Roman Zippel wrote:
> Hi,
> 
> On Fri, 8 Jul 2005, Pekka J Enberg wrote:
> 
> > I don't see how the following is tortured: 
> > enum {
> >       PNODE_MEMBER_VFS  = 0x01,
> >       PNODE_SLAVE_VFS   = 0x02
> > }; 
> > In fact, I think it is more natural. An almost identical example even appears
> > in K&R. 
> 
> So it basically comes down to personal preference, if the original uses 
> defines and it works fine, I don't really see a good enough reason to 
> change it to enums, so please leave the decision to author.

Ok. I will change to enums whereever I define new categories of
#defines. And leave the #defines for already existing category as is.

RP



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 17:52           ` Ram
@ 2005-07-08 19:49             ` Miklos Szeredi
  2005-07-14  1:27               ` Ram
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2005-07-08 19:49 UTC (permalink / raw)
  To: linuxram; +Cc: miklos, linux-kernel, linux-fsdevel, viro, akpm, mike, bfields

> The reason why I implemented that way, is to less confuse the user and
> provide more flexibility. With my implementation, we have the ability
> to share any part of the tree, without bothering if it is a mountpoint
> or not. The side effect of this operation is, it ends up creating 
> a vfsmount if the dentry is not a mountpoint.
> 
> so when a user says
>       mount --make-shared /tmp/abc
> the tree under /tmp/abc becomes shared. 
> With your suggestion either the user will get -EINVAL or the tree
> under / will become shared. The second behavior will be really
> confusing.

You are right.

> I am ok with -EINVAL. 

I think it should be this then.  These operations are similar to a
remount (they don't actually mount something, just change some
property of a mount).  Remount returns -EINVAL if not performed on the
root of a mount.

> Also there is another reason why I used this behavior. Lets say /mnt
> is a mountpoint and Say a user does
> 		mount make-shared /mnt 
> 
> and then does 
>                 mount --bind /mnt/abc  /mnt1
> 
> NOTE: we need propogation to be set up between /mnt/abc and /mnt1 and
> propogation can only be set up for vfsmounts.  In this case /mnt/abc 
> is not a mountpoint. I have two choices, either return -EINVAL
> or create a vfsmount at that point. But -EINVAL is not consistent
> with standard --bind behavior. So I chose the later behavior.
> 
> Now that we anyway need this behavior while doing bind mounts from
> shared trees, I kept the same behavior for --make-shared.

Well, the mount program can easily implement this behavior if wanted,
just by doing the 'bind dir dir' and then doing 'make-shared dir'.

The other way round (disabling the automatic 'bind dir dir') is much
more difficult.

> > Some notes (maybe outside the code) explaining the mechanism of the
> > propagations would be nice.  Without these it's hard to understand the
> > design decisions behind such an implementation.
> 
> Ok. I will make a small writeup on the mechanism.

That will help, thanks.

Miklos




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 19:33                           ` Pekka Enberg
@ 2005-07-08 19:59                             ` Roman Zippel
  2005-07-10 18:21                               ` Pekka Enberg
  0 siblings, 1 reply; 48+ messages in thread
From: Roman Zippel @ 2005-07-08 19:59 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Bryan Henderson, Andrew Morton, bfields, linux-fsdevel,
	linux-kernel, linuxram, mike, Miklos Szeredi, Alexander Viro

Hi,

On Fri, 8 Jul 2005, Pekka Enberg wrote:

> On Fri, 2005-07-08 at 21:11 +0200, Roman Zippel wrote:
> > So it basically comes down to personal preference, if the original uses 
> > defines and it works fine, I don't really see a good enough reason to 
> > change it to enums, so please leave the decision to author.
> 
> (And I don't see a good enough reason to use #defines when you don't
>  absolutely have to. This is what we disagree on.)

"use" != "change".
If an author already uses defines, that's fine and in most cases there is 
no reason to change it.

> Roman, it is not as if I get to decide for the patch submitters. I
> comment on any issues _I_ have with the patch and the authors fix
> whatever they want (or what the maintainers ask for).

The point of a review is to comment on things that _need_ fixing. Less 
experienced hackers take this a requirement for their drivers to be 
included.

> P.S. Working code is not enough for the kernel. It must be maintainable
> as well.

defines are perfectly maintainable.

bye, Roman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 18:15                       ` Wichert Akkerman
@ 2005-07-08 20:23                         ` Mike Waychison
  2005-07-10 21:57                           ` Pavel Machek
  0 siblings, 1 reply; 48+ messages in thread
From: Mike Waychison @ 2005-07-08 20:23 UTC (permalink / raw)
  To: Wichert Akkerman
  Cc: Bryan Henderson, Roman Zippel, Andrew Morton, bfields,
	linux-fsdevel, linux-kernel, linuxram, Miklos Szeredi,
	Pekka J Enberg, Alexander Viro

Wichert Akkerman wrote:
> Previously Mike Waychison wrote:
> 
>>enums in C are (de?)promoted to integral types under most conditions, so 
>>the type-checking is useless.
> 
> 
> It's a warning in gcc afaik and spare should complain as well.
> 

Check again.

You must be thinking of another language.

Show me how you can get warnings out of this:

enum foo { FOO, };
 

static enum foo doit(enum foo bar) {
         int i = bar;
         return i;
}
 

int main(void)
{
         int i;
         i = doit(0);
         return 0;
}

Mike Waychison

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 18:21                       ` Pekka J Enberg
  2005-07-08 19:11                         ` Roman Zippel
@ 2005-07-08 22:12                         ` Bryan Henderson
  1 sibling, 0 replies; 48+ messages in thread
From: Bryan Henderson @ 2005-07-08 22:12 UTC (permalink / raw)
  To: Pekka J Enberg
  Cc: Andrew Morton, bfields, linux-fsdevel, linux-kernel, linuxram,
	mike, Miklos Szeredi, Alexander Viro, Roman Zippel

>I don't see how the following is tortured: 
>
>enum {
>       PNODE_MEMBER_VFS  = 0x01,
>       PNODE_SLAVE_VFS   = 0x02
>}; 

Only because it's using a facility that's supposed to be for enumerated 
types for something that isn't.  If it were a true enumerated type, the 
codes for the enumerations (0x01, 0x02) would be quite arbitrary, whereas 
here they must fundamentally be integers whose pure binary cipher has 
exactly one 1 bit (because, as I understand it, these are used as bitmasks 
somewhere).

I can see that this paradigm has practical advantages over using macros 
(or a middle ground - integer constants), but only as a byproduct of what 
the construct is really for.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 16:57                   ` Roman Zippel
  2005-07-08 17:16                     ` Bryan Henderson
@ 2005-07-10 10:55                     ` Denis Vlasenko
  1 sibling, 0 replies; 48+ messages in thread
From: Denis Vlasenko @ 2005-07-10 10:55 UTC (permalink / raw)
  To: Roman Zippel, Bryan Henderson
  Cc: Andrew Morton, bfields, linux-fsdevel, linux-kernel, linuxram,
	mike, Miklos Szeredi, Pekka J Enberg, Alexander Viro

On Friday 08 July 2005 19:57, Roman Zippel wrote:
> Hi,
> 
> On Fri, 8 Jul 2005, Bryan Henderson wrote:
> 
> > I wasn't aware anyone preferred defines to enums for declaring enumerated 
> > data types.
> 
> If it's really enumerated data types, that's fine, but this example was 
> about bitfield masks.
> 
> > Isn't the only argument for defines, "that's what I'm used to."?
> 
> defines are not just used for constants and there is _nothing_ wrong with 
> using defines for constants.

Apart from the wart that defines know zilch about scopes.
Thus completely fine looking code will give you atrociously
obscure compiler message. Real world example for userspace:

# cat t.c
#include <errno.h>
#include <stdio.h>

void f() {
    char errno[] = "hello world";
    puts(errno);
}

# gcc -c t.c
t.c: In function `f':
t.c:5: error: function `__errno_location' is initialized like a variable
t.c:5: error: conflicting types for '__errno_location'
/usr/include/bits/errno.h:38: error: previous declaration of '__errno_location' was here
--
vda


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 19:59                             ` Roman Zippel
@ 2005-07-10 18:21                               ` Pekka Enberg
  2005-07-10 18:40                                 ` randy_dunlap
                                                   ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Pekka Enberg @ 2005-07-10 18:21 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Bryan Henderson, Andrew Morton, bfields, linux-fsdevel,
	linux-kernel, linuxram, mike, Miklos Szeredi, Alexander Viro

Hi Roman,

At some point in time, I wrote:
> > Roman, it is not as if I get to decide for the patch submitters. I
> > comment on any issues _I_ have with the patch and the authors fix
> > whatever they want (or what the maintainers ask for).

On Fri, 2005-07-08 at 21:59 +0200, Roman Zippel wrote:
> The point of a review is to comment on things that _need_ fixing. Less 
> experienced hackers take this a requirement for their drivers to be 
> included.

Hmm. So we disagree on that issue as well. I think the point of review
is to improve code and help others conform with the existing coding
style which is why I find it strange that you're suggesting me to limit
my comments to a subset you agree with.

Would you please be so kind to define your criteria for things that
"need fixing" so we could see if can reach some sort of an agreement on
this. My list is roughly as follows:

  - Erroneous use of kernel API
  - Bad coding style
  - Layering violations
  - Duplicate code
  - Hard to read code

			Pekka


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-10 18:21                               ` Pekka Enberg
@ 2005-07-10 18:40                                 ` randy_dunlap
  2005-07-10 19:14                                 ` Roman Zippel
  2005-07-10 19:16                                 ` Vojtech Pavlik
  2 siblings, 0 replies; 48+ messages in thread
From: randy_dunlap @ 2005-07-10 18:40 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: zippel, hbryan, akpm, bfields, linux-fsdevel, linux-kernel,
	linuxram, mike, miklos, viro

On Sun, 10 Jul 2005 21:21:42 +0300 Pekka Enberg wrote:

| Hi Roman,
| 
| At some point in time, I wrote:
| > > Roman, it is not as if I get to decide for the patch submitters. I
| > > comment on any issues _I_ have with the patch and the authors fix
| > > whatever they want (or what the maintainers ask for).
| 
| On Fri, 2005-07-08 at 21:59 +0200, Roman Zippel wrote:
| > The point of a review is to comment on things that _need_ fixing. Less 
| > experienced hackers take this a requirement for their drivers to be 
| > included.
| 
| Hmm. So we disagree on that issue as well. I think the point of review
| is to improve code and help others conform with the existing coding
| style which is why I find it strange that you're suggesting me to limit
| my comments to a subset you agree with.
| 
| Would you please be so kind to define your criteria for things that
| "need fixing" so we could see if can reach some sort of an agreement on
| this. My list is roughly as follows:
| 
|   - Erroneous use of kernel API
|   - Bad coding style
|   - Layering violations
|   - Duplicate code
|   - Hard to read code

Those are all good points IMO, but there is little (or no)
concensus on "bad coding style" or "hard to read code".

Maybe it would make more sense to make some suggested changes to
Documentation/CodingStyle (in the form of a patch) and see what
kinds of reactions (support) it gets.

---
~Randy

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-10 18:21                               ` Pekka Enberg
  2005-07-10 18:40                                 ` randy_dunlap
@ 2005-07-10 19:14                                 ` Roman Zippel
  2005-07-11  6:37                                   ` Pekka J Enberg
  2005-07-11 17:13                                   ` Horst von Brand
  2005-07-10 19:16                                 ` Vojtech Pavlik
  2 siblings, 2 replies; 48+ messages in thread
From: Roman Zippel @ 2005-07-10 19:14 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Bryan Henderson, Andrew Morton, bfields, linux-fsdevel,
	linux-kernel, linuxram, mike, Miklos Szeredi, Alexander Viro

Hi,

On Sun, 10 Jul 2005, Pekka Enberg wrote:

> > The point of a review is to comment on things that _need_ fixing. Less 
> > experienced hackers take this a requirement for their drivers to be 
> > included.
> 
> Hmm. So we disagree on that issue as well. I think the point of review
> is to improve code and help others conform with the existing coding
> style which is why I find it strange that you're suggesting me to limit
> my comments to a subset you agree with.
> 
> Would you please be so kind to define your criteria for things that
> "need fixing" so we could see if can reach some sort of an agreement on
> this. My list is roughly as follows:
> 
>   - Erroneous use of kernel API
>   - Bad coding style
>   - Layering violations
>   - Duplicate code
>   - Hard to read code

I don't generally disagree with that, I just think that defines are not 
part of that list.
Look, it's great that you do reviews, but please keep in mind it's the 
author who has to work with code and he has to be primarily happy with, 
so you don't have to point out every minor issue.
Although it also differs between core and driver code, we don't have to be 
that strict with driver code as longs as it "looks" ok and is otherwise 
correct. The requirements for core kernel code are higher, but even here 
defines are a well accepted language construct.

bye, Roman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-10 18:21                               ` Pekka Enberg
  2005-07-10 18:40                                 ` randy_dunlap
  2005-07-10 19:14                                 ` Roman Zippel
@ 2005-07-10 19:16                                 ` Vojtech Pavlik
  2005-07-11 17:18                                   ` Horst von Brand
  2 siblings, 1 reply; 48+ messages in thread
From: Vojtech Pavlik @ 2005-07-10 19:16 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Roman Zippel, Bryan Henderson, Andrew Morton, bfields,
	linux-fsdevel, linux-kernel, linuxram, mike, Miklos Szeredi,
	Alexander Viro

On Sun, Jul 10, 2005 at 09:21:42PM +0300, Pekka Enberg wrote:

> Hmm. So we disagree on that issue as well. I think the point of review
> is to improve code and help others conform with the existing coding
> style which is why I find it strange that you're suggesting me to limit
> my comments to a subset you agree with.
> 
> Would you please be so kind to define your criteria for things that
> "need fixing" so we could see if can reach some sort of an agreement on
> this. My list is roughly as follows:
> 
>   - Erroneous use of kernel API
>   - Bad coding style
>   - Layering violations
>   - Duplicate code
>   - Hard to read code
 
The reason people post their patches for review is to get good feedback
on them. The problems you list above are mostly nitpicks. They must be
fixed before inclusion of the patch, but only make sense to start fixing
once the patch does a reasonable change.

Often patches have deeper problems (like "this won't ever work", "there
is a nice race hidden in there", "why do we need this part at all"), and
spotting those is much more valuable for both the sumbitter and the
progress of development.

Obviously, it's much harder to do that than to comment on a misplaced
brace.

It's an utter waste of effort to force a first time patch author to fix
all the style issues in his patch, just to see it rejected by the
maintainer because it is fundamentally wrong later.

Just something to consider.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-08 20:23                         ` Mike Waychison
@ 2005-07-10 21:57                           ` Pavel Machek
  0 siblings, 0 replies; 48+ messages in thread
From: Pavel Machek @ 2005-07-10 21:57 UTC (permalink / raw)
  To: Mike Waychison
  Cc: Wichert Akkerman, Bryan Henderson, Roman Zippel, Andrew Morton,
	bfields, linux-fsdevel, linux-kernel, linuxram, Miklos Szeredi,
	Pekka J Enberg, Alexander Viro

Hi!

> >>enums in C are (de?)promoted to integral types under most conditions, so 
> >>the type-checking is useless.
> >
> >
> >It's a warning in gcc afaik and spare should complain as well.
> 
> Check again.

Check sparse with -Wbitwise and enum properly marked as bitwise...

									Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-10 19:14                                 ` Roman Zippel
@ 2005-07-11  6:37                                   ` Pekka J Enberg
  2005-07-11 17:13                                   ` Horst von Brand
  1 sibling, 0 replies; 48+ messages in thread
From: Pekka J Enberg @ 2005-07-11  6:37 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Bryan Henderson, Andrew Morton, bfields, linux-fsdevel,
	linux-kernel, linuxram, mike, Miklos Szeredi, Alexander Viro,
	Vojtech Pavlik

Hi Roman, 

Roman Zippel writes:
> I don't generally disagree with that, I just think that defines are not 
> part of that list.

They're in Documentation/CodingStyle (see Chapter 11). 

Roman Zippel writes:
> Look, it's great that you do reviews, but please keep in mind it's the 
> author who has to work with code and he has to be primarily happy with, 
> so you don't have to point out every minor issue.

I completely agree with you that the author must be happy with the code. I 
also think Vojtech has a good point about reviewing for the most important 
things first and worrying about nitpicks later. 

I'll try to find a better balance for reviews. Thanks Roman. 

                  Pekka 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-10 19:14                                 ` Roman Zippel
  2005-07-11  6:37                                   ` Pekka J Enberg
@ 2005-07-11 17:13                                   ` Horst von Brand
  2005-07-11 17:57                                     ` Roman Zippel
  1 sibling, 1 reply; 48+ messages in thread
From: Horst von Brand @ 2005-07-11 17:13 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Pekka Enberg, Bryan Henderson, Andrew Morton, bfields,
	linux-fsdevel, linux-kernel, linuxram, mike, Miklos Szeredi,
	Alexander Viro

Roman Zippel <zippel@linux-m68k.org> wrote:
> On Sun, 10 Jul 2005, Pekka Enberg wrote:

[...]

> > Would you please be so kind to define your criteria for things that
> > "need fixing" so we could see if can reach some sort of an agreement on
> > this. My list is roughly as follows:
> > 
> >   - Erroneous use of kernel API
> >   - Bad coding style
> >   - Layering violations
> >   - Duplicate code
> >   - Hard to read code

> I don't generally disagree with that, I just think that defines are not 
> part of that list.

Covered in "bad coding style" and "hard to read code", at least.

> Look, it's great that you do reviews, but please keep in mind it's the 
> author who has to work with code and he has to be primarily happy with, 
> so you don't have to point out every minor issue.

Wrong. The author has to work with the code, but there are much more people
that have to read it now and fix it in the future. It doesn't make sense
having everybody using their own indentation style, variable naming scheme,
and ways of defining constants. For better or worse, #define /is/ the
standard idiom (in kernel and elsewhere) to define constants in C. See also
<http://www.lysator.liu.se/c/ten-commandments.html>, particularly
comandment 8. And consider also the /spirit/ of Documentation/CodingStyle.

> Although it also differs between core and driver code, we don't have to be 
> that strict with driver code as longs as it "looks" ok and is otherwise 
> correct. The requirements for core kernel code are higher, but even here 
> defines are a well accepted language construct.

I'd rather ask for the higher requirements of authors of new code... not
demand, just ask.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-10 19:16                                 ` Vojtech Pavlik
@ 2005-07-11 17:18                                   ` Horst von Brand
  0 siblings, 0 replies; 48+ messages in thread
From: Horst von Brand @ 2005-07-11 17:18 UTC (permalink / raw)
  To: Vojtech Pavlik
  Cc: Pekka Enberg, Roman Zippel, Bryan Henderson, Andrew Morton,
	bfields, linux-fsdevel, linux-kernel, linuxram, mike,
	Miklos Szeredi, Alexander Viro

Vojtech Pavlik <vojtech@suse.cz> wrote:
> On Sun, Jul 10, 2005 at 09:21:42PM +0300, Pekka Enberg wrote:
> > Hmm. So we disagree on that issue as well. I think the point of review
> > is to improve code and help others conform with the existing coding
> > style which is why I find it strange that you're suggesting me to limit
> > my comments to a subset you agree with.
> > 
> > Would you please be so kind to define your criteria for things that
> > "need fixing" so we could see if can reach some sort of an agreement on
> > this. My list is roughly as follows:
> > 
> >   - Erroneous use of kernel API
> >   - Bad coding style
> >   - Layering violations
> >   - Duplicate code
> >   - Hard to read code

> The reason people post their patches for review is to get good feedback
> on them. The problems you list above are mostly nitpicks. They must be
> fixed before inclusion of the patch, but only make sense to start fixing
> once the patch does a reasonable change.

If the coding style is an obstacle to understanding, it has to be fixed if
there is going to be any kind of review. Besides, nitpicks being simple to
address, they could as well be fixed while at it. Perhaps that way the
author learns to do it right, and less nitpicks are left in later additions
and fixes.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: share/private/slave a subtree - define vs enum
  2005-07-11 17:13                                   ` Horst von Brand
@ 2005-07-11 17:57                                     ` Roman Zippel
  0 siblings, 0 replies; 48+ messages in thread
From: Roman Zippel @ 2005-07-11 17:57 UTC (permalink / raw)
  To: Horst von Brand
  Cc: Pekka Enberg, Bryan Henderson, Andrew Morton, bfields,
	linux-fsdevel, linux-kernel, linuxram, mike, Miklos Szeredi,
	Alexander Viro

Hi,

On Mon, 11 Jul 2005, Horst von Brand wrote:

> > I don't generally disagree with that, I just think that defines are not 
> > part of that list.
> 
> Covered in "bad coding style" and "hard to read code", at least.

Somehow I missed the last lkml debate about where simple defines where a 
problem.

> > Look, it's great that you do reviews, but please keep in mind it's the 
> > author who has to work with code and he has to be primarily happy with, 
> > so you don't have to point out every minor issue.
> 
> Wrong. The author has to work with the code, but there are much more people
> that have to read it now and fix it in the future. It doesn't make sense
> having everybody using their own indentation style, variable naming scheme,
> and ways of defining constants.

I didn't say this, I said "minor issues". Please read more carefully.

bye, Roman

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [RFC PATCH 1/8] share/private/slave a subtree
  2005-07-08 19:49             ` Miklos Szeredi
@ 2005-07-14  1:27               ` Ram
  2005-07-18 11:06                 ` shared subtrees implementation writeup Miklos Szeredi
  0 siblings, 1 reply; 48+ messages in thread
From: Ram @ 2005-07-14  1:27 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-kernel, linux-fsdevel, viro, Andrew Morton, mike, bfields

[-- Attachment #1: Type: text/plain, Size: 2463 bytes --]

On Fri, 2005-07-08 at 12:49, Miklos Szeredi wrote:
> > The reason why I implemented that way, is to less confuse the user and
> > provide more flexibility. With my implementation, we have the ability
> > to share any part of the tree, without bothering if it is a mountpoint
> > or not. The side effect of this operation is, it ends up creating 
> > a vfsmount if the dentry is not a mountpoint.
> > 
> > so when a user says
> >       mount --make-shared /tmp/abc
> > the tree under /tmp/abc becomes shared. 
> > With your suggestion either the user will get -EINVAL or the tree
> > under / will become shared. The second behavior will be really
> > confusing.
> 
> You are right.
> 
> > I am ok with -EINVAL. 
> 
> I think it should be this then.  These operations are similar to a
> remount (they don't actually mount something, just change some
> property of a mount).  Remount returns -EINVAL if not performed on the
> root of a mount.
> 
> > Also there is another reason why I used this behavior. Lets say /mnt
> > is a mountpoint and Say a user does
> > 		mount make-shared /mnt 
> > 
> > and then does 
> >                 mount --bind /mnt/abc  /mnt1
> > 
> > NOTE: we need propogation to be set up between /mnt/abc and /mnt1 and
> > propogation can only be set up for vfsmounts.  In this case /mnt/abc 
> > is not a mountpoint. I have two choices, either return -EINVAL
> > or create a vfsmount at that point. But -EINVAL is not consistent
> > with standard --bind behavior. So I chose the later behavior.
> > 
> > Now that we anyway need this behavior while doing bind mounts from
> > shared trees, I kept the same behavior for --make-shared.
> 
> Well, the mount program can easily implement this behavior if wanted,
> just by doing the 'bind dir dir' and then doing 'make-shared dir'.
> 
> The other way round (disabling the automatic 'bind dir dir') is much
> more difficult.

Ok. will make it -EINVAL. It was not clear in Al Viro's RFC what the
behavior should be.

> 
> > > Some notes (maybe outside the code) explaining the mechanism of the
> > > propagations would be nice.  Without these it's hard to understand the
> > > design decisions behind such an implementation.
> > 
> > Ok. I will make a small writeup on the mechanism.
> 
> That will help, thanks.

A small writeup is enclosed. Caution its too complex. :) 

RP

Sorry if reading it as a attachment is difficult. my mailer does
not allow me to inline properly. will try mutt next time.

[-- Attachment #2: pnode.writeup --]
[-- Type: text/plain, Size: 8396 bytes --]

		Pnode traversal implementation.


This write-up explains the motivation and reason behind the implementation
of pnode_next() and pnode_traverse() functionality.

Section 1 explains the operations  involved during a mount in a shared subtree.
Section 2 explains the operations  involved during a umount in a shared subtree.
Section 3 explains the operations  involved to check of umount_busy in
	shared subtree.
Section 4 explains the operations  involved in making a overlay mount in
	a shared subtree. (make_mounted operation)
Section 5 explains the operations  involved in removing a overlay mount in
	a shared subtree. (make_umounted operation)

And finally section 6 explains the motivation behind the pnode_next()
and pnode_traverse().

	Caution: head can spin as you try to understand the detail. :)


Section 1. mount:

	to begin with we have a the following mount tree 

		         root
		      /	/  \  \ \
		     /	t1  t2 \  \ 
		   t0		t3 \
				    t4

	note: 
	t0, t1, t2, t3, t4 all contain mounts.
	t1 t2 t3 are the slave of t0. 
	t4 is the slave of t2.
	t4 and t3 is marked as shared.

	The corresponding propagation tree will be:

			p0
		      /   \
		     p1   p2
		     /     
	 	     p3	   


	***************************************************************
	      p0 contains the mount t0, and contains the slave mount t1
	      p1 contains the mount t2
	      p3 contains the mount t4
	      p2 contains the mount t3

	  NOTE: you may need to look at this multiple time as you try to
	  	understand the various scenarios.
	***************************************************************

	
	now if we mount something under the mount t0, the same has
	be mounted under all the other mounts (t1,t2,t3,and t4) and
	the new propagation tree for all these child mounts should
	look identical to the one of their parents.

	say I mounted something under /t0/abc the new mount tree will
	look like:

		         root
		      /	/  \  \ \
		     /	t1  t2 \  \ 
		   t0   /   /   t3 \
		   /   c1  c2	/   t4
		  c0	       c3    \
		  		     c4

		and we will have the following propagation trees.

			p0			s0  
		      /   \     	      /   \
		     p1   p2    	     s1   s2
		     /          	     /     
	 	     p3	         	     s3	   

		the propagation tree for all the new child mounts
		will look exactly like that of its parent.

	      s0 contains the mount c0, and contains the slave mount c1
	      s1 contains the mount c2
	      s3 contains the mount c4
	      s2 contains the mount c3


	In order to implement this functionality, we need to walk through
	the pnode tree of the parent mount. For each pnode encountered
	in the tree we have to create a new pnode and make the new child
	mounts as the members of their corresponding pnode, and finally
	set the master slave relationship between each of the newly
	created pnodes.

	lets step through the operations:
	1. start at pnode p0, create a new pnode s0.
	2. walk down to p1, create a new pnode s1.
	3. walk down to p3, create a new pnode s3.
	4. since there is no slave pnode for p3, complete the mount operations.
	      (a) create a new child mount c4 under t4 and make c4 member
	     		of s3.
	5. walk up back from p3 to p1, and make s3 the slave of s1.
	6. since there are no more slave pnodes for p3, complete the
		mount operations.
	      (a) create a new child mount c2 under t2 and make c2 member
	     		of s1.
	7. walk up back from p1 to p0 and make s1 the slave pnode of s0.
	8. since p2 is the next slave pnode of p0, walk down to p2, and
	   create a new pnode s2.
	9. since there is no slave pnodes of p2, complete the mount operations
	      (a) create a new child mount c3 under t3 and make c3 member
	     		of s2.
	10. walk up from p2 to p0 and make s2 the slave pnode of s0.
	11. since there are no more slave pnodes for p0, complete the
		mount operations.
	      (a) create a new child mount c0 under t0 and make c0 member
	     		of s0.
	      (b) create a new child mount c1 under t1 and make c1 
	      		slave-member of s0.
	11. done.

	
	The key point to be noted in the above set of operations is:
	each pnode does three different operations corresponding to each stage.

	A. when the pnode is encountered the first time, it has to create
		a new pnode for its child mounts.
	B. when the pnode is encountered again after it has traversed down
	   each slave pnode, it has to associate the slave pnode's newly created
	   pnode with the pnode's newly created pnode.
	C. when the pnode is encountered finally after having traversed through
		all its slave pnodes, it has to create new child mounts
		for each of its member mounts.

	that is the reason next_mnt() returns the same pnode multiple times
	with the following flags to indicate the context:
	(1) PNODE_DOWN to indicate context (A)
	(2) PNODE_MID to indicate context (B)
	(3) PNODE_UP to indicate context (C)


Section 2. umount:

	Umount is a less complex operation. The crux of its work
	lies when the pnode has walked through all of its slaves.
	(which is phase C). 

	Consider the following mount tree.
	and the following pnode trees.

		         root
		      /	/  \  \ \
		     /	t1  t2 \  \ 
		   t0   /   /   t3 \
		   /   c1  c2	/   t4
		  c0	       c3    \
		  		     c4

		and we will have 2 propagation trees.

			p0			s0  
		      /   \     	      /   \
		     p1   p2    	     s1   s2
		     /          	     /     
	 	     p3	         	     s3	   

	if a umount is attempted on c0, all the other child mounts 
	c0,c1,c2,c3,c4 should also be unmounted. The steps are:

	(1) start at pnode p0. nothing to do currently.
	(2) walk down to p1. nothing to do currently.
	(3) walk down to p3. nothing to do currently.
	(4) since p3 has no slave pnode.
		unmount mounts corresponding to its members. 
		In this case t4 is a member of p3, so unmount c4,
		and release c4 from its pnode s3.
	(5) walk back up to p1 and check if there are any more 
		slave pnodes.  Since there are no more slave pnodes
		unmount the mounts corresponding to its members.
		In this case, t2 is a member of p1, unmount c2,
		and release c2 from its pnode s1.
	(6) walk back up to p0 and check if there are any more 
		slave pnodes.  There is one more slave pnode p2.
		do nothing.
	(7) walk down to p2. nothing to do currently.
	(8) since p2 has no slave pnode, unmount mounts
		corresponding to its members. In this case t3 is
		a member of p2. So unmount c3, and release c3 from
		its pnode s2.
	(9) walk back up to p0 and check if there are any more slave
		pnodes. There is no more slave pnodes, so
		unmount mounts corresponding to its members. In 
		this case t0 and t1 are the member and slave-member
		mounts currently.  so unmount c0 and c1 and 
		detach them from their pnodes which is s0.
	(10) done!


	Again as in the mount case(section 1), here also we traverse
	the pnode tree similarly, but crux of the operations is
	done in the phase C i.e in the context of PNODE_UP.


Section 3. checking for umount busy
	The operations are again mostly identical to section (2),
	and the crux of the operations are in phase C i.e in the
	context of PNODE_UP.

Section 4. make mounted operation.
	this operations overlay mount on the same dentry. Its operations
	are mostly similar to that of mount.

Section 5. make unmounted operation.
	this operation is the inverse of make-mounted operation. Its operations
	are mostly similar to that of umount.



Section 6:

	The large amount of common of operations done during mount,umount,
	umount_busy, and others motivates the need of a common abstractions
	which all these operations can exploit. That is the reason for the
	function pnode_traverse()

	pnode_traverse() takes 3 different function pointers.
	1. pnode_pre_func() which is called when PNODE_DOWN is encountered.
	2. pnode_post_func() which is called when PNODE_MID is encountered.
	3. vfs_func() is called on each of the member/slave vfsmount 
		of the pnode this function is called when PNODE_UP 
		is encountered.

	There is scope for optimization, by colleasing the work done at phase
	(A) and phase (C), and maybe we can eliminate phase (B) using some
	intelligent techniques. 

	The reason I have not spent much time optimizing this right now is that,
	as we understand better about the functionality we may need all these
	phases. Once I am absolutely convinced that we don't need some of
	these phases, I will optimize them.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* shared subtrees implementation writeup
  2005-07-14  1:27               ` Ram
@ 2005-07-18 11:06                 ` Miklos Szeredi
  2005-07-18 17:18                   ` Ram Pai
  0 siblings, 1 reply; 48+ messages in thread
From: Miklos Szeredi @ 2005-07-18 11:06 UTC (permalink / raw)
  To: linuxram; +Cc: linux-kernel, linux-fsdevel, viro, akpm, mike, bfields

Thanks for the writeup, it helps to understand things a bit better.
However I still don't understand a few things:


> Section 1. mount:
> 
> 	to begin with we have a the following mount tree 
> 
> 		         root
> 		      /	/  \  \ \
> 		     /	t1  t2 \  \ 
> 		   t0		t3 \
> 				    t4
> 
> 	note: 
> 	t0, t1, t2, t3, t4 all contain mounts.
> 	t1 t2 t3 are the slave of t0. 
> 	t4 is the slave of t2.
> 	t4 and t3 is marked as shared.
> 
> 	The corresponding propagation tree will be:
> 
> 			p0
> 		      /   \
> 		     p1   p2
> 		     /     
> 	 	     p3	   
> 
> 
> 	***************************************************************
> 	      p0 contains the mount t0, and contains the slave mount t1
> 	      p1 contains the mount t2
> 	      p3 contains the mount t4
> 	      p2 contains the mount t3
> 
> 	  NOTE: you may need to look at this multiple time as you try to
> 	  	understand the various scenarios.
> 	***************************************************************

Why you have p2 and p3?  They contain a single mount only, which could
directly be slaves to p0 and p1 respectively.  Does it have something
to do with being shared?

BTW, is there a reason not to include the pnode info in 'struct
vfsmount'?  That would simplify a lot of allocation error cases.

> 	The key point to be noted in the above set of operations is:
> 	each pnode does three different operations corresponding to each stage.
> 
> 	A. when the pnode is encountered the first time, it has to create
> 		a new pnode for its child mounts.
> 	B. when the pnode is encountered again after it has traversed down
> 	   each slave pnode, it has to associate the slave pnode's newly created
> 	   pnode with the pnode's newly created pnode.
> 	C. when the pnode is encountered finally after having traversed through
> 		all its slave pnodes, it has to create new child mounts
> 		for each of its member mounts.

Now why is this needed?  Couldn't each of these be done in a single step?

I still can't see the reason for having these things done at different
stages of the traversal.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: shared subtrees implementation writeup
  2005-07-18 11:06                 ` shared subtrees implementation writeup Miklos Szeredi
@ 2005-07-18 17:18                   ` Ram Pai
  0 siblings, 0 replies; 48+ messages in thread
From: Ram Pai @ 2005-07-18 17:18 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: linux-kernel, linux-fsdevel, viro, Andrew Morton, mike, bfields

On Mon, 2005-07-18 at 04:06, Miklos Szeredi wrote:
> Thanks for the writeup, it helps to understand things a bit better.
> However I still don't understand a few things:
> 
> 
> > Section 1. mount:
> > 
> > 	to begin with we have a the following mount tree 
> > 
> > 		         root
> > 		      /	/  \  \ \
> > 		     /	t1  t2 \  \ 
> > 		   t0		t3 \
> > 				    t4
> > 
> > 	note: 
> > 	t0, t1, t2, t3, t4 all contain mounts.
> > 	t1 t2 t3 are the slave of t0. 
> > 	t4 is the slave of t2.
> > 	t4 and t3 is marked as shared.
> > 
> > 	The corresponding propagation tree will be:
> > 
> > 			p0
> > 		      /   \
> > 		     p1   p2
> > 		     /     
> > 	 	     p3	   
> > 
> > 
> > 	***************************************************************
> > 	      p0 contains the mount t0, and contains the slave mount t1
> > 	      p1 contains the mount t2
> > 	      p3 contains the mount t4
> > 	      p2 contains the mount t3
> > 
> > 	  NOTE: you may need to look at this multiple time as you try to
> > 	  	understand the various scenarios.
> > 	***************************************************************
> 
> Why you have p2 and p3?  They contain a single mount only, which could
> directly be slaves to p0 and p1 respectively.  Does it have something
> to do with being shared?

Yes. If the mounts were just slave than they could be a slave member of
their corresponding master pnode, i.e p0 and p1 respectively. But 
in my example above they are also shared. And a shared mount could be
bind mounted with propagation set in either direction. Hence they
deserve a separate pnode.  If it was just a slave mount then binding to
it would not set any propagation and hence there need not be a separate
pnodes to track the propagation.

Just for clarification:
1. a slave mount is represented as a slave member of a pnode.
2. a shared mount is represented as a member of a  pnode.
3. a slave as well as a shared mount is represented a member of
	separate pnode, which in itself is a slave pnode.
4. a private mount is not part of any pnode.
5. a unclone mount is also not part of any pnode.


> 
> BTW, is there a reason not to include the pnode info in 'struct
> vfsmount'?  That would simplify a lot of allocation error cases.
> 
> > 	The key point to be noted in the above set of operations is:
> > 	each pnode does three different operations corresponding to each stage.
> > 
> > 	A. when the pnode is encountered the first time, it has to create
> > 		a new pnode for its child mounts.
> > 	B. when the pnode is encountered again after it has traversed down
> > 	   each slave pnode, it has to associate the slave pnode's newly created
> > 	   pnode with the pnode's newly created pnode.
> > 	C. when the pnode is encountered finally after having traversed through
> > 		all its slave pnodes, it has to create new child mounts
> > 		for each of its member mounts.
> 
> Now why is this needed?  Couldn't each of these be done in a single step?
> 
> I still can't see the reason for having these things done at different
> stages of the traversal.

Yes. This can be done in a single step. And in fact in my latest patches
that I sent yesterday I did exactly that. It works. All that messy
PNODE_UP,PNODE_DOWN,PNODE_MID is all gone. Code has become
much simpler.

The reason this was there earlier was that I was thinking we may need
all these phases for some operations like umount, make_mounted.. 
But as I understand the operations better I am convinced that it is not
required, and you reconfirm that point :)

Thanks,
RP
> 
> Thanks,
> Miklos


^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2005-07-18 17:18 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1120816072.30164.10.camel@localhost>
2005-07-08 10:25 ` [RFC PATCH 0/8] shared subtree Ram
     [not found] ` <1120816229.30164.13.camel@localhost>
2005-07-08 10:25   ` [RFC PATCH 1/8] share/private/slave a subtree Ram
2005-07-08 11:17     ` Pekka Enberg
2005-07-08 12:19       ` Roman Zippel
2005-07-08 12:26         ` Pekka J Enberg
2005-07-08 12:46           ` Roman Zippel
2005-07-08 12:58             ` Pekka J Enberg
2005-07-08 13:34               ` Roman Zippel
2005-07-08 16:17                 ` Pekka Enberg
2005-07-08 16:33                 ` share/private/slave a subtree - define vs enum Bryan Henderson
2005-07-08 16:57                   ` Roman Zippel
2005-07-08 17:16                     ` Bryan Henderson
2005-07-08 18:21                       ` Pekka J Enberg
2005-07-08 19:11                         ` Roman Zippel
2005-07-08 19:33                           ` Pekka Enberg
2005-07-08 19:59                             ` Roman Zippel
2005-07-10 18:21                               ` Pekka Enberg
2005-07-10 18:40                                 ` randy_dunlap
2005-07-10 19:14                                 ` Roman Zippel
2005-07-11  6:37                                   ` Pekka J Enberg
2005-07-11 17:13                                   ` Horst von Brand
2005-07-11 17:57                                     ` Roman Zippel
2005-07-10 19:16                                 ` Vojtech Pavlik
2005-07-11 17:18                                   ` Horst von Brand
2005-07-08 19:38                           ` Ram
2005-07-08 22:12                         ` Bryan Henderson
2005-07-10 10:55                     ` Denis Vlasenko
2005-07-08 18:03                   ` Wichert Akkerman
2005-07-08 18:10                     ` Mike Waychison
2005-07-08 18:15                       ` Wichert Akkerman
2005-07-08 20:23                         ` Mike Waychison
2005-07-10 21:57                           ` Pavel Machek
2005-07-08 16:29       ` [RFC PATCH 1/8] share/private/slave a subtree Ram
2005-07-08 14:32     ` Miklos Szeredi
2005-07-08 16:19       ` Ram
2005-07-08 16:51         ` Miklos Szeredi
2005-07-08 17:52           ` Ram
2005-07-08 19:49             ` Miklos Szeredi
2005-07-14  1:27               ` Ram
2005-07-18 11:06                 ` shared subtrees implementation writeup Miklos Szeredi
2005-07-18 17:18                   ` Ram Pai
     [not found]   ` <1120816355.30164.16.camel@localhost>
2005-07-08 10:25     ` [RFC PATCH 2/8] unclone a subtree Ram
     [not found]     ` <1120816436.30164.19.camel@localhost>
2005-07-08 10:25       ` [RFC PATCH 3/8] bind/rbind a shared/private/slave/unclone tree Ram
     [not found]       ` <1120816521.30164.22.camel@localhost>
2005-07-08 10:25         ` [RFC PATCH 4/8] move " Ram
     [not found]         ` <1120816600.30164.25.camel@localhost>
2005-07-08 10:25           ` [RFC PATCH 5/8] umount " Ram
     [not found]           ` <1120816720.30164.28.camel@localhost>
2005-07-08 10:26             ` [RFC PATCH 6/8] clone a namespace containing " Ram
     [not found]             ` <1120816835.30164.31.camel@localhost>
2005-07-08 10:26               ` [RFC PATCH 7/8] automounter support for shared/slave/private/unclone Ram
2005-07-08 10:26                 ` [RFC PATCH 8/8] pnode.c optimization Ram

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).