All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] tagged sysfs support
@ 2010-03-30 18:30 Eric W. Biederman
  2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
                   ` (15 more replies)
  0 siblings, 16 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 18:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, Greg KH, linux-kernel, Tejun Heo, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev


The main short coming of using multiple network namespaces today
is that only network devices for the primary network namespaces
can be put in the kobject layer and sysfs.

This is essentially the earlier version of this patchset that was
reviewed before, just now on top of a version of sysfs that doesn't
need cleanup patches to support it.

I have been running these patches in some form for well over a
year so the basics should at least be solid.  

This patchset is currently against 2.6.34-rc1.

This patchset is just the basic infrastructure a couple of more pretty
trivial patches are needed to actually enable network namespaces to use this.
My current plan is to send those after these patches have made it through
review.

 drivers/base/class.c    |    9 ++++
 drivers/base/core.c     |   98 +++++++++++++++++++++++++++++++++----------
 drivers/gpio/gpiolib.c  |    2 +-
 drivers/md/bitmap.c     |    4 +-
 drivers/md/md.c         |    6 +-
 fs/sysfs/bin.c          |    2 +-
 fs/sysfs/dir.c          |  106 ++++++++++++++++++++++++++++++++++++-----------
 fs/sysfs/file.c         |   17 ++++---
 fs/sysfs/group.c        |    6 +-
 fs/sysfs/inode.c        |    6 ++-
 fs/sysfs/mount.c        |   91 +++++++++++++++++++++++++++++++++++++++-
 fs/sysfs/symlink.c      |   35 ++++++++++++++-
 fs/sysfs/sysfs.h        |   23 ++++++++--
 include/linux/device.h  |    3 +
 include/linux/kobject.h |   26 +++++++++++
 include/linux/sysfs.h   |   18 ++++++++
 lib/kobject.c           |  104 ++++++++++++++++++++++++++++++++++++++++++++++
 17 files changed, 480 insertions(+), 76 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
@ 2010-03-30 18:31 ` Eric W. Biederman
  2010-03-30 19:23   ` Eric Dumazet
                     ` (3 more replies)
  2010-03-30 18:31 ` [PATCH 2/6] kobj: Add basic infrastructure for dealing with namespaces Eric W. Biederman
                   ` (14 subsequent siblings)
  15 siblings, 4 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 18:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Add all of the necessary bioler plate to support
multiple superblocks in sysfs.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 fs/sysfs/mount.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/sysfs/sysfs.h |    3 ++
 2 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 0cb1088..6a433ac 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -71,16 +71,70 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
 	return 0;
 }
 
+static int sysfs_test_super(struct super_block *sb, void *data)
+{
+	struct sysfs_super_info *sb_info = sysfs_info(sb);
+	struct sysfs_super_info *info = data;
+	int found = 1;
+	return found;
+}
+
+static int sysfs_set_super(struct super_block *sb, void *data)
+{
+	int error;
+	error = set_anon_super(sb, data);
+	if (!error)
+		sb->s_fs_info = data;
+	return error;
+}
+
 static int sysfs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
-	return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
+	struct sysfs_super_info *info;
+	struct super_block *sb;
+	int error;
+
+	error = -ENOMEM;
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		goto out;
+	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
+	if (IS_ERR(sb) || sb->s_fs_info != info)
+		kfree(info);
+	if (IS_ERR(sb)) {
+		kfree(info);
+		error = PTR_ERR(sb);
+		goto out;
+	}
+	if (!sb->s_root) {
+		sb->s_flags = flags;
+		error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
+		if (error) {
+			deactivate_locked_super(sb);
+			goto out;
+		}
+		sb->s_flags |= MS_ACTIVE;
+	}
+
+	simple_set_mnt(mnt, sb);
+	error = 0;
+out:
+	return error;
+}
+
+static void sysfs_kill_sb(struct super_block *sb)
+{
+	struct sysfs_super_info *info = sysfs_info(sb);
+
+	kill_anon_super(sb);
+	kfree(info);
 }
 
 static struct file_system_type sysfs_fs_type = {
 	.name		= "sysfs",
 	.get_sb		= sysfs_get_sb,
-	.kill_sb	= kill_anon_super,
+	.kill_sb	= sysfs_kill_sb,
 };
 
 int __init sysfs_init(void)
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 30f5a44..030a39d 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
 /*
  * mount.c
  */
+struct sysfs_super_info {
+};
+#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
 extern struct sysfs_dirent sysfs_root;
 extern struct kmem_cache *sysfs_dir_cachep;
 
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 2/6] kobj: Add basic infrastructure for dealing with namespaces.
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
  2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
@ 2010-03-30 18:31 ` Eric W. Biederman
  2010-04-29 20:29   ` patch kobj-add-basic-infrastructure-for-dealing-with-namespaces.patch added to gregkh-2.6 tree gregkh
  2010-03-30 18:31 ` [PATCH 3/6] sysfs: Implement sysfs tagged directory support Eric W. Biederman
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 18:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Move complete knowledge of namespaces into the kobject layer
so we can use that information when reporting kobjects to
userspace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/base/class.c    |    9 ++++
 drivers/base/core.c     |   77 ++++++++++++++++++++++++++++------
 include/linux/device.h  |    3 +
 include/linux/kobject.h |   26 ++++++++++++
 lib/kobject.c           |  103 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 204 insertions(+), 14 deletions(-)

diff --git a/drivers/base/class.c b/drivers/base/class.c
index 0147f47..36723d2 100644
--- a/drivers/base/class.c
+++ b/drivers/base/class.c
@@ -63,6 +63,14 @@ static void class_release(struct kobject *kobj)
 	kfree(cp);
 }
 
+static const struct kobj_ns_type_operations *class_child_ns_type(struct kobject *kobj)
+{
+	struct class_private *cp = to_class(kobj);
+	struct class *class = cp->class;
+
+	return class->ns_type;
+}
+
 static const struct sysfs_ops class_sysfs_ops = {
 	.show	= class_attr_show,
 	.store	= class_attr_store,
@@ -71,6 +79,7 @@ static const struct sysfs_ops class_sysfs_ops = {
 static struct kobj_type class_ktype = {
 	.sysfs_ops	= &class_sysfs_ops,
 	.release	= class_release,
+	.child_ns_type	= class_child_ns_type,
 };
 
 /* Hotplug events for classes go to the class class_subsys */
diff --git a/drivers/base/core.c b/drivers/base/core.c
index ef55df3..6b32f6e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -132,9 +132,21 @@ static void device_release(struct kobject *kobj)
 	kfree(p);
 }
 
+static const void *device_namespace(struct kobject *kobj)
+{
+	struct device *dev = to_dev(kobj);
+	const void *ns = NULL;
+
+	if (dev->class && dev->class->ns_type)
+		ns = dev->class->namespace(dev);
+
+	return ns;
+}
+
 static struct kobj_type device_ktype = {
 	.release	= device_release,
 	.sysfs_ops	= &dev_sysfs_ops,
+	.namespace	= device_namespace,
 };
 
 
@@ -596,11 +608,59 @@ static struct kobject *virtual_device_parent(struct device *dev)
 	return virtual_dir;
 }
 
-static struct kobject *get_device_parent(struct device *dev,
-					 struct device *parent)
+struct class_dir {
+	struct kobject kobj;
+	struct class *class;
+};
+
+#define to_class_dir(obj) container_of(obj, struct class_dir, kobj)
+
+static void class_dir_release(struct kobject *kobj)
+{
+	struct class_dir *dir = to_class_dir(kobj);
+	kfree(dir);
+}
+
+static const
+struct kobj_ns_type_operations *class_dir_child_ns_type(struct kobject *kobj)
 {
+	struct class_dir *dir = to_class_dir(kobj);
+	return dir->class->ns_type;
+}
+
+static struct kobj_type class_dir_ktype = {
+	.release	= class_dir_release,
+	.sysfs_ops	= &kobj_sysfs_ops,
+	.child_ns_type	= class_dir_child_ns_type
+};
+
+static struct kobject *
+class_dir_create_and_add(struct class *class, struct kobject *parent_kobj)
+{
+	struct class_dir *dir;
 	int retval;
 
+	dir = kzalloc(sizeof(*dir), GFP_KERNEL);
+	if (!dir)
+		return NULL;
+
+	dir->class = class;
+	kobject_init(&dir->kobj, &class_dir_ktype);
+
+	dir->kobj.kset = &class->p->class_dirs;
+
+	retval = kobject_add(&dir->kobj, parent_kobj, "%s", class->name);
+	if (retval < 0) {
+		kobject_put(&dir->kobj);
+		return NULL;
+	}
+	return &dir->kobj;
+}
+
+
+static struct kobject *get_device_parent(struct device *dev,
+					 struct device *parent)
+{
 	if (dev->class) {
 		static DEFINE_MUTEX(gdp_mutex);
 		struct kobject *kobj = NULL;
@@ -635,18 +695,7 @@ static struct kobject *get_device_parent(struct device *dev,
 		}
 
 		/* or create a new class-directory at the parent device */
-		k = kobject_create();
-		if (!k) {
-			mutex_unlock(&gdp_mutex);
-			return NULL;
-		}
-		k->kset = &dev->class->p->class_dirs;
-		retval = kobject_add(k, parent_kobj, "%s", dev->class->name);
-		if (retval < 0) {
-			mutex_unlock(&gdp_mutex);
-			kobject_put(k);
-			return NULL;
-		}
+		k = class_dir_create_and_add(dev->class, parent_kobj);
 		/* do not emit an uevent for this simple "glue" directory */
 		mutex_unlock(&gdp_mutex);
 		return k;
diff --git a/include/linux/device.h b/include/linux/device.h
index 1821928..638a8f3 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -203,6 +203,9 @@ struct class {
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
 
+	const struct kobj_ns_type_operations *ns_type;
+	const void *(*namespace)(struct device *dev);
+
 	const struct dev_pm_ops *pm;
 
 	struct class_private *p;
diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index 3950d3c..d9456f6 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -108,6 +108,8 @@ struct kobj_type {
 	void (*release)(struct kobject *kobj);
 	const struct sysfs_ops *sysfs_ops;
 	struct attribute **default_attrs;
+	const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
+	const void *(*namespace)(struct kobject *kobj);
 };
 
 struct kobj_uevent_env {
@@ -134,6 +136,30 @@ struct kobj_attribute {
 
 extern const struct sysfs_ops kobj_sysfs_ops;
 
+enum kobj_ns_type {
+	KOBJ_NS_TYPE_NONE = 0,
+	KOBJ_NS_TYPES
+};
+
+struct sock;
+struct kobj_ns_type_operations {
+	enum kobj_ns_type type;
+	const void *(*current_ns)(void);
+	const void *(*netlink_ns)(struct sock *sk);
+	const void *(*initial_ns)(void);
+};
+
+int kobj_ns_type_register(const struct kobj_ns_type_operations *ops);
+int kobj_ns_type_registered(enum kobj_ns_type type);
+const struct kobj_ns_type_operations *kobj_child_ns_ops(struct kobject *parent);
+const struct kobj_ns_type_operations *kobj_ns_ops(struct kobject *kobj);
+
+const void *kobj_ns_current(enum kobj_ns_type type);
+const void *kobj_ns_netlink(enum kobj_ns_type type, struct sock *sk);
+const void *kobj_ns_initial(enum kobj_ns_type type);
+void kobj_ns_exit(enum kobj_ns_type type, const void *ns);
+
+
 /**
  * struct kset - a set of kobjects of a specific type, belonging to a specific subsystem.
  *
diff --git a/lib/kobject.c b/lib/kobject.c
index 8115eb1..bbb2bb4 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -850,6 +850,109 @@ struct kset *kset_create_and_add(const char *name,
 }
 EXPORT_SYMBOL_GPL(kset_create_and_add);
 
+
+static DEFINE_SPINLOCK(kobj_ns_type_lock);
+static const struct kobj_ns_type_operations *kobj_ns_ops_tbl[KOBJ_NS_TYPES];
+
+int kobj_ns_type_register(const struct kobj_ns_type_operations *ops)
+{
+	enum kobj_ns_type type = ops->type;
+	int error;
+
+	spin_lock(&kobj_ns_type_lock);
+
+	error = -EINVAL;
+	if (type >= KOBJ_NS_TYPES)
+		goto out;
+
+	error = -EINVAL;
+	if (type <= KOBJ_NS_TYPE_NONE)
+		goto out;
+
+	error = -EBUSY;
+	if (kobj_ns_ops_tbl[type])
+		goto out;
+
+	error = 0;
+	kobj_ns_ops_tbl[type] = ops;
+
+out:
+	spin_unlock(&kobj_ns_type_lock);
+	return error;
+}
+
+int kobj_ns_type_registered(enum kobj_ns_type type)
+{
+	int registered = 0;
+
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES))
+		registered = kobj_ns_ops_tbl[type] != NULL;
+	spin_unlock(&kobj_ns_type_lock);
+
+	return registered;
+}
+
+const struct kobj_ns_type_operations *kobj_child_ns_ops(struct kobject *parent)
+{
+	const struct kobj_ns_type_operations *ops = NULL;
+
+	if (parent && parent->ktype->child_ns_type)
+		ops = parent->ktype->child_ns_type(parent);
+
+	return ops;
+}
+
+const struct kobj_ns_type_operations *kobj_ns_ops(struct kobject *kobj)
+{
+	return kobj_child_ns_ops(kobj->parent);
+}
+
+
+const void *kobj_ns_current(enum kobj_ns_type type)
+{
+	const void *ns = NULL;
+
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) &&
+	    kobj_ns_ops_tbl[type])
+		ns = kobj_ns_ops_tbl[type]->current_ns();
+	spin_unlock(&kobj_ns_type_lock);
+
+	return ns;
+}
+
+const void *kobj_ns_netlink(enum kobj_ns_type type, struct sock *sk)
+{
+	const void *ns = NULL;
+
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) &&
+	    kobj_ns_ops_tbl[type])
+		ns = kobj_ns_ops_tbl[type]->netlink_ns(sk);
+	spin_unlock(&kobj_ns_type_lock);
+
+	return ns;
+}
+
+const void *kobj_ns_initial(enum kobj_ns_type type)
+{
+	const void *ns = NULL;
+
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) &&
+	    kobj_ns_ops_tbl[type])
+		ns = kobj_ns_ops_tbl[type]->initial_ns();
+	spin_unlock(&kobj_ns_type_lock);
+
+	return ns;
+}
+
+void kobj_ns_exit(enum kobj_ns_type type, const void *ns)
+{
+}
+
+
 EXPORT_SYMBOL(kobject_get);
 EXPORT_SYMBOL(kobject_put);
 EXPORT_SYMBOL(kobject_del);
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
  2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
  2010-03-30 18:31 ` [PATCH 2/6] kobj: Add basic infrastructure for dealing with namespaces Eric W. Biederman
@ 2010-03-30 18:31 ` Eric W. Biederman
  2010-03-31  2:43   ` Serge E. Hallyn
                     ` (2 more replies)
  2010-03-30 18:31 ` [PATCH 4/6] sysfs: Add support for tagged directories with untagged members Eric W. Biederman
                   ` (12 subsequent siblings)
  15 siblings, 3 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 18:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Eric W. Biederman, Benjamin Thery

From: Eric W. Biederman <ebiederm@xmission.com>

The problem.  When implementing a network namespace I need to be able
to have multiple network devices with the same name.  Currently this
is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
potentially a few other directories of the form /sys/ ... /net/*.

What this patch does is to add an additional tag field to the
sysfs dirent structure.  For directories that should show different
contents depending on the context such as /sys/class/net/, and
/sys/devices/virtual/net/ this tag field is used to specify the
context in which those directories should be visible.  Effectively
this is the same as creating multiple distinct directories with
the same name but internally to sysfs the result is nicer.

I am calling the concept of a single directory that looks like multiple
directories all at the same path in the filesystem tagged directories.

For the networking namespace the set of directories whose contents I need
to filter with tags can depend on the presence or absence of hotplug
hardware or which modules are currently loaded.  Which means I need
a simple race free way to setup those directories as tagged.

To achieve a reace free design all tagged directories are created
and managed by sysfs itself.

Users of this interface:
- define a type in the sysfs_tag_type enumeration.
- call sysfs_register_ns_types with the type and it's operations
- sysfs_exit_ns when an individual tag is no longer valid

- Implement mount_ns() which returns the ns of the calling process
  so we can attach it to a sysfs superblock.
- Implement ktype.namespace() which returns the ns of a syfs kobject.

Everything else is left up to sysfs and the driver layer.

For the network namespace mount_ns and namespace() are essentially
one line functions, and look to remain that.

Tags are currently represented a const void * pointers as that is
both generic, prevides enough information for equality comparisons,
and is trivial to create for current users, as it is just the
existing namespace pointer.

The work needed in sysfs is more extensive.  At each directory
or symlink creating I need to check if the directory it is being
created in is a tagged directory and if so generate the appropriate
tag to place on the sysfs_dirent.  Likewise at each symlink or
directory removal I need to check if the sysfs directory it is
being removed from is a tagged directory and if so figure out
which tag goes along with the name I am deleting.

Currently only directories which hold kobjects, and
symlinks are supported.  There is not enough information
in the current file attribute interfaces to give us anything
to discriminate on which makes it useless, and there are
no potential users which makes it an uninteresting problem
to solve.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
---
 drivers/gpio/gpiolib.c |    2 +-
 drivers/md/bitmap.c    |    4 +-
 drivers/md/md.c        |    6 +-
 fs/sysfs/bin.c         |    2 +-
 fs/sysfs/dir.c         |  112 +++++++++++++++++++++++++++++++++++++----------
 fs/sysfs/file.c        |   17 ++++---
 fs/sysfs/group.c       |    6 +-
 fs/sysfs/inode.c       |    4 +-
 fs/sysfs/mount.c       |   33 ++++++++++++++-
 fs/sysfs/symlink.c     |   15 +++++-
 fs/sysfs/sysfs.h       |   20 +++++++--
 include/linux/sysfs.h  |   10 ++++
 lib/kobject.c          |    1 +
 13 files changed, 181 insertions(+), 51 deletions(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 6d1b866..6c388db 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -398,7 +398,7 @@ static int gpio_setup_irq(struct gpio_desc *desc, struct device *dev,
 			goto free_id;
 		}
 
-		pdesc->value_sd = sysfs_get_dirent(dev->kobj.sd, "value");
+		pdesc->value_sd = sysfs_get_dirent(dev->kobj.sd, NULL, "value");
 		if (!pdesc->value_sd) {
 			ret = -ENODEV;
 			goto free_id;
diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 26ac8aa..f084249 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1678,9 +1678,9 @@ int bitmap_create(mddev_t *mddev)
 
 	bitmap->mddev = mddev;
 
-	bm = sysfs_get_dirent(mddev->kobj.sd, "bitmap");
+	bm = sysfs_get_dirent(mddev->kobj.sd, NULL, "bitmap");
 	if (bm) {
-		bitmap->sysfs_can_clear = sysfs_get_dirent(bm, "can_clear");
+		bitmap->sysfs_can_clear = sysfs_get_dirent(bm, NULL, "can_clear");
 		sysfs_put(bm);
 	} else
 		bitmap->sysfs_can_clear = NULL;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index fdc1890..ed6ca8c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1765,7 +1765,7 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 		kobject_del(&rdev->kobj);
 		goto fail;
 	}
-	rdev->sysfs_state = sysfs_get_dirent(rdev->kobj.sd, "state");
+	rdev->sysfs_state = sysfs_get_dirent(rdev->kobj.sd, NULL, "state");
 
 	list_add_rcu(&rdev->same_set, &mddev->disks);
 	bd_claim_by_disk(rdev->bdev, rdev->bdev->bd_holder, mddev->gendisk);
@@ -4182,7 +4182,7 @@ static int md_alloc(dev_t dev, char *name)
 	mutex_unlock(&disks_mutex);
 	if (!error) {
 		kobject_uevent(&mddev->kobj, KOBJ_ADD);
-		mddev->sysfs_state = sysfs_get_dirent(mddev->kobj.sd, "array_state");
+		mddev->sysfs_state = sysfs_get_dirent(mddev->kobj.sd, NULL, "array_state");
 	}
 	mddev_put(mddev);
 	return error;
@@ -4391,7 +4391,7 @@ static int do_md_run(mddev_t * mddev)
 			printk(KERN_WARNING
 			       "md: cannot register extra attributes for %s\n",
 			       mdname(mddev));
-		mddev->sysfs_action = sysfs_get_dirent(mddev->kobj.sd, "sync_action");
+		mddev->sysfs_action = sysfs_get_dirent(mddev->kobj.sd, NULL, "sync_action");
 	} else if (mddev->ro == 2) /* auto-readonly not meaningful */
 		mddev->ro = 0;
 
diff --git a/fs/sysfs/bin.c b/fs/sysfs/bin.c
index e9d2935..806b277 100644
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -501,7 +501,7 @@ int sysfs_create_bin_file(struct kobject *kobj,
 void sysfs_remove_bin_file(struct kobject *kobj,
 			   const struct bin_attribute *attr)
 {
-	sysfs_hash_and_remove(kobj->sd, attr->attr.name);
+	sysfs_hash_and_remove(kobj->sd, NULL, attr->attr.name);
 }
 
 EXPORT_SYMBOL_GPL(sysfs_create_bin_file);
diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index 5907178..b0e7911 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -380,9 +380,15 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
 {
 	struct sysfs_inode_attrs *ps_iattr;
 
-	if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
+	if (sysfs_find_dirent(acxt->parent_sd, sd->s_ns, sd->s_name))
 		return -EEXIST;
 
+	if (sysfs_ns_type(acxt->parent_sd) && !sd->s_ns) {
+		WARN(1, KERN_WARNING "sysfs: ns required in '%s' for '%s'\n",
+			acxt->parent_sd->s_name, sd->s_name);
+		return -EINVAL;
+	}
+
 	sd->s_parent = sysfs_get(acxt->parent_sd);
 
 	sysfs_link_sibling(sd);
@@ -533,13 +539,17 @@ void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt)
  *	Pointer to sysfs_dirent if found, NULL if not.
  */
 struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
+				       const void *ns,
 				       const unsigned char *name)
 {
 	struct sysfs_dirent *sd;
 
-	for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling)
+	for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling) {
+		if (sd->s_ns != ns)
+			continue;
 		if (!strcmp(sd->s_name, name))
 			return sd;
+	}
 	return NULL;
 }
 
@@ -558,12 +568,13 @@ struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
  *	Pointer to sysfs_dirent if found, NULL if not.
  */
 struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
+				      const void *ns,
 				      const unsigned char *name)
 {
 	struct sysfs_dirent *sd;
 
 	mutex_lock(&sysfs_mutex);
-	sd = sysfs_find_dirent(parent_sd, name);
+	sd = sysfs_find_dirent(parent_sd, ns, name);
 	sysfs_get(sd);
 	mutex_unlock(&sysfs_mutex);
 
@@ -572,7 +583,8 @@ struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
 EXPORT_SYMBOL_GPL(sysfs_get_dirent);
 
 static int create_dir(struct kobject *kobj, struct sysfs_dirent *parent_sd,
-		      const char *name, struct sysfs_dirent **p_sd)
+	enum kobj_ns_type type, const void *ns, const char *name,
+	struct sysfs_dirent **p_sd)
 {
 	umode_t mode = S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO;
 	struct sysfs_addrm_cxt acxt;
@@ -583,6 +595,9 @@ static int create_dir(struct kobject *kobj, struct sysfs_dirent *parent_sd,
 	sd = sysfs_new_dirent(name, mode, SYSFS_DIR);
 	if (!sd)
 		return -ENOMEM;
+
+	sd->s_flags |= (type << SYSFS_NS_TYPE_SHIFT);
+	sd->s_ns = ns;
 	sd->s_dir.kobj = kobj;
 
 	/* link in */
@@ -601,7 +616,25 @@ static int create_dir(struct kobject *kobj, struct sysfs_dirent *parent_sd,
 int sysfs_create_subdir(struct kobject *kobj, const char *name,
 			struct sysfs_dirent **p_sd)
 {
-	return create_dir(kobj, kobj->sd, name, p_sd);
+	return create_dir(kobj, kobj->sd,
+			  KOBJ_NS_TYPE_NONE, NULL, name, p_sd);
+}
+
+static enum kobj_ns_type sysfs_read_ns_type(struct kobject *kobj)
+{
+	const struct kobj_ns_type_operations *ops;
+	enum kobj_ns_type type;
+
+	ops = kobj_child_ns_ops(kobj);
+	if (!ops)
+		return KOBJ_NS_TYPE_NONE;
+
+	type = ops->type;
+	BUG_ON(type <= KOBJ_NS_TYPE_NONE);
+	BUG_ON(type >= KOBJ_NS_TYPES);
+	BUG_ON(!kobj_ns_type_registered(type));
+
+	return type;
 }
 
 /**
@@ -610,7 +643,9 @@ int sysfs_create_subdir(struct kobject *kobj, const char *name,
  */
 int sysfs_create_dir(struct kobject * kobj)
 {
+	enum kobj_ns_type type;
 	struct sysfs_dirent *parent_sd, *sd;
+	const void *ns = NULL;
 	int error = 0;
 
 	BUG_ON(!kobj);
@@ -620,7 +655,11 @@ int sysfs_create_dir(struct kobject * kobj)
 	else
 		parent_sd = &sysfs_root;
 
-	error = create_dir(kobj, parent_sd, kobject_name(kobj), &sd);
+	if (sysfs_ns_type(parent_sd))
+		ns = kobj->ktype->namespace(kobj);
+	type = sysfs_read_ns_type(kobj);
+
+	error = create_dir(kobj, parent_sd, type, ns, kobject_name(kobj), &sd);
 	if (!error)
 		kobj->sd = sd;
 	return error;
@@ -630,13 +669,19 @@ static struct dentry * sysfs_lookup(struct inode *dir, struct dentry *dentry,
 				struct nameidata *nd)
 {
 	struct dentry *ret = NULL;
-	struct sysfs_dirent *parent_sd = dentry->d_parent->d_fsdata;
+	struct dentry *parent = dentry->d_parent;
+	struct sysfs_dirent *parent_sd = parent->d_fsdata;
 	struct sysfs_dirent *sd;
 	struct inode *inode;
+	enum kobj_ns_type type;
+	const void *ns;
 
 	mutex_lock(&sysfs_mutex);
 
-	sd = sysfs_find_dirent(parent_sd, dentry->d_name.name);
+	type = sysfs_ns_type(parent_sd);
+	ns = sysfs_info(dir->i_sb)->ns[type];
+
+	sd = sysfs_find_dirent(parent_sd, ns, dentry->d_name.name);
 
 	/* no such entry */
 	if (!sd) {
@@ -735,7 +780,8 @@ void sysfs_remove_dir(struct kobject * kobj)
 }
 
 int sysfs_rename(struct sysfs_dirent *sd,
-	struct sysfs_dirent *new_parent_sd, const char *new_name)
+	struct sysfs_dirent *new_parent_sd, const void *new_ns,
+	const char *new_name)
 {
 	const char *dup_name = NULL;
 	int error;
@@ -743,12 +789,12 @@ int sysfs_rename(struct sysfs_dirent *sd,
 	mutex_lock(&sysfs_mutex);
 
 	error = 0;
-	if ((sd->s_parent == new_parent_sd) &&
+	if ((sd->s_parent == new_parent_sd) && (sd->s_ns == new_ns) &&
 	    (strcmp(sd->s_name, new_name) == 0))
 		goto out;	/* nothing to rename */
 
 	error = -EEXIST;
-	if (sysfs_find_dirent(new_parent_sd, new_name))
+	if (sysfs_find_dirent(new_parent_sd, new_ns, new_name))
 		goto out;
 
 	/* rename sysfs_dirent */
@@ -770,6 +816,7 @@ int sysfs_rename(struct sysfs_dirent *sd,
 		sd->s_parent = new_parent_sd;
 		sysfs_link_sibling(sd);
 	}
+	sd->s_ns = new_ns;
 
 	error = 0;
  out:
@@ -780,19 +827,28 @@ int sysfs_rename(struct sysfs_dirent *sd,
 
 int sysfs_rename_dir(struct kobject *kobj, const char *new_name)
 {
-	return sysfs_rename(kobj->sd, kobj->sd->s_parent, new_name);
+	struct sysfs_dirent *parent_sd = kobj->sd->s_parent;
+	const void *new_ns = NULL;
+	
+	if (sysfs_ns_type(parent_sd))
+		new_ns = kobj->ktype->namespace(kobj);
+
+	return sysfs_rename(kobj->sd, parent_sd, new_ns, new_name);
 }
 
 int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
 {
 	struct sysfs_dirent *sd = kobj->sd;
 	struct sysfs_dirent *new_parent_sd;
+	const void *new_ns = NULL;
 
 	BUG_ON(!sd->s_parent);
+	if (sysfs_ns_type(sd->s_parent))
+		new_ns = kobj->ktype->namespace(kobj);
 	new_parent_sd = new_parent_kobj && new_parent_kobj->sd ?
 		new_parent_kobj->sd : &sysfs_root;
 
-	return sysfs_rename(sd, new_parent_sd, sd->s_name);
+	return sysfs_rename(sd, new_parent_sd, new_ns, sd->s_name);
 }
 
 /* Relationship between s_mode and the DT_xxx types */
@@ -807,32 +863,35 @@ static int sysfs_dir_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
-static struct sysfs_dirent *sysfs_dir_pos(struct sysfs_dirent *parent_sd,
-	ino_t ino, struct sysfs_dirent *pos)
+static struct sysfs_dirent *sysfs_dir_pos(const void *ns,
+	struct sysfs_dirent *parent_sd,	ino_t ino, struct sysfs_dirent *pos)
 {
 	if (pos) {
 		int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
 			pos->s_parent == parent_sd &&
 			ino == pos->s_ino;
 		sysfs_put(pos);
-		if (valid)
-			return pos;
+		if (!valid)
+			pos = NULL;
 	}
-	pos = NULL;
-	if ((ino > 1) && (ino < INT_MAX)) {
+	if (!pos && (ino > 1) && (ino < INT_MAX)) {
 		pos = parent_sd->s_dir.children;
 		while (pos && (ino > pos->s_ino))
 			pos = pos->s_sibling;
 	}
+	while (pos && pos->s_ns != ns)
+		pos = pos->s_sibling;
 	return pos;
 }
 
-static struct sysfs_dirent *sysfs_dir_next_pos(struct sysfs_dirent *parent_sd,
-	ino_t ino, struct sysfs_dirent *pos)
+static struct sysfs_dirent *sysfs_dir_next_pos(const void *ns,
+	struct sysfs_dirent *parent_sd,	ino_t ino, struct sysfs_dirent *pos)
 {
-	pos = sysfs_dir_pos(parent_sd, ino, pos);
+	pos = sysfs_dir_pos(ns, parent_sd, ino, pos);
 	if (pos)
 		pos = pos->s_sibling;
+	while (pos && pos->s_ns != ns)
+		pos = pos->s_sibling;
 	return pos;
 }
 
@@ -841,8 +900,13 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
 	struct dentry *dentry = filp->f_path.dentry;
 	struct sysfs_dirent * parent_sd = dentry->d_fsdata;
 	struct sysfs_dirent *pos = filp->private_data;
+	enum kobj_ns_type type;
+	const void *ns;
 	ino_t ino;
 
+	type = sysfs_ns_type(parent_sd);
+	ns = sysfs_info(dentry->d_sb)->ns[type];
+
 	if (filp->f_pos == 0) {
 		ino = parent_sd->s_ino;
 		if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) == 0)
@@ -857,9 +921,9 @@ static int sysfs_readdir(struct file * filp, void * dirent, filldir_t filldir)
 			filp->f_pos++;
 	}
 	mutex_lock(&sysfs_mutex);
-	for (pos = sysfs_dir_pos(parent_sd, filp->f_pos, pos);
+	for (pos = sysfs_dir_pos(ns, parent_sd, filp->f_pos, pos);
 	     pos;
-	     pos = sysfs_dir_next_pos(parent_sd, filp->f_pos, pos)) {
+	     pos = sysfs_dir_next_pos(ns, parent_sd, filp->f_pos, pos)) {
 		const char * name;
 		unsigned int type;
 		int len, ret;
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index e222b25..1beaa73 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -478,9 +478,12 @@ void sysfs_notify(struct kobject *k, const char *dir, const char *attr)
 	mutex_lock(&sysfs_mutex);
 
 	if (sd && dir)
-		sd = sysfs_find_dirent(sd, dir);
+		/* Only directories are tagged, so no need to pass
+		 * a tag explicitly.
+		 */
+		sd = sysfs_find_dirent(sd, NULL, dir);
 	if (sd && attr)
-		sd = sysfs_find_dirent(sd, attr);
+		sd = sysfs_find_dirent(sd, NULL, attr);
 	if (sd)
 		sysfs_notify_dirent(sd);
 
@@ -569,7 +572,7 @@ int sysfs_add_file_to_group(struct kobject *kobj,
 	int error;
 
 	if (group)
-		dir_sd = sysfs_get_dirent(kobj->sd, group);
+		dir_sd = sysfs_get_dirent(kobj->sd, NULL, group);
 	else
 		dir_sd = sysfs_get(kobj->sd);
 
@@ -599,7 +602,7 @@ int sysfs_chmod_file(struct kobject *kobj, struct attribute *attr, mode_t mode)
 	mutex_lock(&sysfs_mutex);
 
 	rc = -ENOENT;
-	sd = sysfs_find_dirent(kobj->sd, attr->name);
+	sd = sysfs_find_dirent(kobj->sd, NULL, attr->name);
 	if (!sd)
 		goto out;
 
@@ -624,7 +627,7 @@ EXPORT_SYMBOL_GPL(sysfs_chmod_file);
 
 void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr)
 {
-	sysfs_hash_and_remove(kobj->sd, attr->name);
+	sysfs_hash_and_remove(kobj->sd, NULL, attr->name);
 }
 
 void sysfs_remove_files(struct kobject * kobj, const struct attribute **ptr)
@@ -646,11 +649,11 @@ void sysfs_remove_file_from_group(struct kobject *kobj,
 	struct sysfs_dirent *dir_sd;
 
 	if (group)
-		dir_sd = sysfs_get_dirent(kobj->sd, group);
+		dir_sd = sysfs_get_dirent(kobj->sd, NULL, group);
 	else
 		dir_sd = sysfs_get(kobj->sd);
 	if (dir_sd) {
-		sysfs_hash_and_remove(dir_sd, attr->name);
+		sysfs_hash_and_remove(dir_sd, NULL, attr->name);
 		sysfs_put(dir_sd);
 	}
 }
diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index fe61194..23c1e59 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -23,7 +23,7 @@ static void remove_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
 	int i;
 
 	for (i = 0, attr = grp->attrs; *attr; i++, attr++)
-		sysfs_hash_and_remove(dir_sd, (*attr)->name);
+		sysfs_hash_and_remove(dir_sd, NULL, (*attr)->name);
 }
 
 static int create_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
@@ -39,7 +39,7 @@ static int create_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
 		 * visibility.  Do this by first removing then
 		 * re-adding (if required) the file */
 		if (update)
-			sysfs_hash_and_remove(dir_sd, (*attr)->name);
+			sysfs_hash_and_remove(dir_sd, NULL, (*attr)->name);
 		if (grp->is_visible) {
 			mode = grp->is_visible(kobj, *attr, i);
 			if (!mode)
@@ -132,7 +132,7 @@ void sysfs_remove_group(struct kobject * kobj,
 	struct sysfs_dirent *sd;
 
 	if (grp->name) {
-		sd = sysfs_get_dirent(dir_sd, grp->name);
+		sd = sysfs_get_dirent(dir_sd, NULL, grp->name);
 		if (!sd) {
 			WARN(!sd, KERN_WARNING "sysfs group %p not found for "
 				"kobject '%s'\n", grp, kobject_name(kobj));
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 082daae..90a899e 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -323,7 +323,7 @@ void sysfs_delete_inode(struct inode *inode)
 	sysfs_put(sd);
 }
 
-int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
+int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const void *ns, const char *name)
 {
 	struct sysfs_addrm_cxt acxt;
 	struct sysfs_dirent *sd;
@@ -333,7 +333,7 @@ int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
 
 	sysfs_addrm_start(&acxt, dir_sd);
 
-	sd = sysfs_find_dirent(dir_sd, name);
+	sd = sysfs_find_dirent(dir_sd, ns, name);
 	if (sd)
 		sysfs_remove_one(&acxt, sd);
 
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 6a433ac..c722471 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -34,7 +34,7 @@ static const struct super_operations sysfs_ops = {
 struct sysfs_dirent sysfs_root = {
 	.s_name		= "",
 	.s_count	= ATOMIC_INIT(1),
-	.s_flags	= SYSFS_DIR,
+	.s_flags	= SYSFS_DIR | (KOBJ_NS_TYPE_NONE << SYSFS_NS_TYPE_SHIFT),
 	.s_mode		= S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
 	.s_ino		= 1,
 };
@@ -75,7 +75,13 @@ static int sysfs_test_super(struct super_block *sb, void *data)
 {
 	struct sysfs_super_info *sb_info = sysfs_info(sb);
 	struct sysfs_super_info *info = data;
+	enum kobj_ns_type type;
 	int found = 1;
+
+	for (type = KOBJ_NS_TYPE_NONE; type < KOBJ_NS_TYPES; type++) {
+		if (sb_info->ns[type] != info->ns[type])
+			found = 0;
+	}
 	return found;
 }
 
@@ -92,6 +98,7 @@ static int sysfs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
 	struct sysfs_super_info *info;
+	enum kobj_ns_type type;
 	struct super_block *sb;
 	int error;
 
@@ -99,6 +106,10 @@ static int sysfs_get_sb(struct file_system_type *fs_type,
 	info = kzalloc(sizeof(*info), GFP_KERNEL);
 	if (!info)
 		goto out;
+
+	for (type = KOBJ_NS_TYPE_NONE; type < KOBJ_NS_TYPES; type++)
+		info->ns[type] = kobj_ns_current(type);
+
 	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
 	if (IS_ERR(sb) || sb->s_fs_info != info)
 		kfree(info);
@@ -137,6 +148,26 @@ static struct file_system_type sysfs_fs_type = {
 	.kill_sb	= sysfs_kill_sb,
 };
 
+void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
+{
+	struct super_block *sb;
+
+	mutex_lock(&sysfs_mutex);
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
+		struct sysfs_super_info *info = sysfs_info(sb);
+		/* Ignore superblocks that are in the process of unmounting */
+		if (sb->s_count <= S_BIAS)
+			continue;
+		/* Ignore superblocks with the wrong ns */
+		if (info->ns[type] != ns)
+			continue;
+		info->ns[type] = NULL;
+	}
+	spin_unlock(&sb_lock);
+	mutex_unlock(&sysfs_mutex);
+}
+
 int __init sysfs_init(void)
 {
 	int err = -ENOMEM;
diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 1b9a3a1..56ccdc6 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -57,6 +57,8 @@ static int sysfs_do_create_link(struct kobject *kobj, struct kobject *target,
 	if (!sd)
 		goto out_put;
 
+	if (sysfs_ns_type(parent_sd))
+		sd->s_ns = target->ktype->namespace(target);
 	sd->s_symlink.target_sd = target_sd;
 	target_sd = NULL;	/* reference is now owned by the symlink */
 
@@ -120,7 +122,7 @@ void sysfs_remove_link(struct kobject * kobj, const char * name)
 	else
 		parent_sd = kobj->sd;
 
-	sysfs_hash_and_remove(parent_sd, name);
+	sysfs_hash_and_remove(parent_sd, NULL, name);
 }
 
 /**
@@ -136,6 +138,7 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
 			const char *old, const char *new)
 {
 	struct sysfs_dirent *parent_sd, *sd = NULL;
+	const void *old_ns = NULL, *new_ns = NULL;
 	int result;
 
 	if (!kobj)
@@ -143,8 +146,11 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
 	else
 		parent_sd = kobj->sd;
 
+	if (targ->sd)
+		old_ns = targ->sd->s_ns;
+
 	result = -ENOENT;
-	sd = sysfs_get_dirent(parent_sd, old);
+	sd = sysfs_get_dirent(parent_sd, old_ns, old);
 	if (!sd)
 		goto out;
 
@@ -154,7 +160,10 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
 	if (sd->s_symlink.target_sd->s_dir.kobj != targ)
 		goto out;
 
-	result = sysfs_rename(sd, parent_sd, new);
+	if (sysfs_ns_type(parent_sd))
+		new_ns = targ->ktype->namespace(targ);
+
+	result = sysfs_rename(sd, parent_sd, new_ns, new);
 
 out:
 	sysfs_put(sd);
diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
index 030a39d..93847d5 100644
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -58,6 +58,7 @@ struct sysfs_dirent {
 	struct sysfs_dirent	*s_sibling;
 	const char		*s_name;
 
+	const void		*s_ns;
 	union {
 		struct sysfs_elem_dir		s_dir;
 		struct sysfs_elem_symlink	s_symlink;
@@ -81,14 +82,22 @@ struct sysfs_dirent {
 #define SYSFS_COPY_NAME			(SYSFS_DIR | SYSFS_KOBJ_LINK)
 #define SYSFS_ACTIVE_REF		(SYSFS_KOBJ_ATTR | SYSFS_KOBJ_BIN_ATTR)
 
-#define SYSFS_FLAG_MASK			~SYSFS_TYPE_MASK
-#define SYSFS_FLAG_REMOVED		0x0200
+#define SYSFS_NS_TYPE_MASK		0xff00
+#define SYSFS_NS_TYPE_SHIFT		8
+
+#define SYSFS_FLAG_MASK			~(SYSFS_NS_TYPE_MASK|SYSFS_TYPE_MASK)
+#define SYSFS_FLAG_REMOVED		0x020000
 
 static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
 {
 	return sd->s_flags & SYSFS_TYPE_MASK;
 }
 
+static inline enum kobj_ns_type sysfs_ns_type(struct sysfs_dirent *sd)
+{
+	return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;
+}
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 #define sysfs_dirent_init_lockdep(sd)				\
 do {								\
@@ -115,6 +124,7 @@ struct sysfs_addrm_cxt {
  * mount.c
  */
 struct sysfs_super_info {
+	const void *ns[KOBJ_NS_TYPES];
 };
 #define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
 extern struct sysfs_dirent sysfs_root;
@@ -140,8 +150,10 @@ void sysfs_remove_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd);
 void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt);
 
 struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
+				       const void *ns,
 				       const unsigned char *name);
 struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
+				      const void *ns,
 				      const unsigned char *name);
 struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type);
 
@@ -152,7 +164,7 @@ int sysfs_create_subdir(struct kobject *kobj, const char *name,
 void sysfs_remove_subdir(struct sysfs_dirent *sd);
 
 int sysfs_rename(struct sysfs_dirent *sd,
-	struct sysfs_dirent *new_parent_sd, const char *new_name);
+	struct sysfs_dirent *new_parent_sd, const void *ns, const char *new_name);
 
 static inline struct sysfs_dirent *__sysfs_get(struct sysfs_dirent *sd)
 {
@@ -182,7 +194,7 @@ int sysfs_setattr(struct dentry *dentry, struct iattr *iattr);
 int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
 int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
 		size_t size, int flags);
-int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
+int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const void *ns, const char *name);
 int sysfs_inode_init(void);
 
 /*
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index f0496b3..1885d21 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -20,6 +20,7 @@
 
 struct kobject;
 struct module;
+enum kobj_ns_type;
 
 /* FIXME
  * The *owner field is no longer used.
@@ -168,10 +169,14 @@ void sysfs_remove_file_from_group(struct kobject *kobj,
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr);
 void sysfs_notify_dirent(struct sysfs_dirent *sd);
 struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
+				      const void *ns,
 				      const unsigned char *name);
 struct sysfs_dirent *sysfs_get(struct sysfs_dirent *sd);
 void sysfs_put(struct sysfs_dirent *sd);
 void sysfs_printk_last_file(void);
+
+void sysfs_exit_ns(enum kobj_ns_type type, const void *tag);
+
 int __must_check sysfs_init(void);
 
 #else /* CONFIG_SYSFS */
@@ -301,6 +306,7 @@ static inline void sysfs_notify_dirent(struct sysfs_dirent *sd)
 }
 static inline
 struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
+				      const void *ns,
 				      const unsigned char *name)
 {
 	return NULL;
@@ -313,6 +319,10 @@ static inline void sysfs_put(struct sysfs_dirent *sd)
 {
 }
 
+static inline void sysfs_exit_ns(enum kobj_ns_type type, const void *tag)
+{
+}
+
 static inline int __must_check sysfs_init(void)
 {
 	return 0;
diff --git a/lib/kobject.c b/lib/kobject.c
index bbb2bb4..b2c6d1f 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -950,6 +950,7 @@ const void *kobj_ns_initial(enum kobj_ns_type type)
 
 void kobj_ns_exit(enum kobj_ns_type type, const void *ns)
 {
+	sysfs_exit_ns(type, ns);
 }
 
 
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 4/6] sysfs: Add support for tagged directories with untagged members.
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (2 preceding siblings ...)
  2010-03-30 18:31 ` [PATCH 3/6] sysfs: Implement sysfs tagged directory support Eric W. Biederman
@ 2010-03-30 18:31 ` Eric W. Biederman
  2010-04-29 20:29   ` patch sysfs-add-support-for-tagged-directories-with-untagged-members.patch added to gregkh-2.6 tree gregkh
  2010-03-30 18:31 ` [PATCH 5/6] sysfs: Implement sysfs_delete_link Eric W. Biederman
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 18:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Eric W. Biederman, Eric W. Biederman

From: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com>

I had hopped to avoid this but the bonding driver adds a file
to /sys/class/net/  and the easiest way to handle that file is
to make it untagged and to register it only once.

So relax the rules on tagged directories, and make bonding work.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
 fs/sysfs/dir.c   |   12 +++---------
 fs/sysfs/inode.c |    2 ++
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index b0e7911..d3dd097 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -383,12 +383,6 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
 	if (sysfs_find_dirent(acxt->parent_sd, sd->s_ns, sd->s_name))
 		return -EEXIST;
 
-	if (sysfs_ns_type(acxt->parent_sd) && !sd->s_ns) {
-		WARN(1, KERN_WARNING "sysfs: ns required in '%s' for '%s'\n",
-			acxt->parent_sd->s_name, sd->s_name);
-		return -EINVAL;
-	}
-
 	sd->s_parent = sysfs_get(acxt->parent_sd);
 
 	sysfs_link_sibling(sd);
@@ -545,7 +539,7 @@ struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
 	struct sysfs_dirent *sd;
 
 	for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling) {
-		if (sd->s_ns != ns)
+		if (ns && sd->s_ns && (sd->s_ns != ns))
 			continue;
 		if (!strcmp(sd->s_name, name))
 			return sd;
@@ -879,7 +873,7 @@ static struct sysfs_dirent *sysfs_dir_pos(const void *ns,
 		while (pos && (ino > pos->s_ino))
 			pos = pos->s_sibling;
 	}
-	while (pos && pos->s_ns != ns)
+	while (pos && pos->s_ns && pos->s_ns != ns)
 		pos = pos->s_sibling;
 	return pos;
 }
@@ -890,7 +884,7 @@ static struct sysfs_dirent *sysfs_dir_next_pos(const void *ns,
 	pos = sysfs_dir_pos(ns, parent_sd, ino, pos);
 	if (pos)
 		pos = pos->s_sibling;
-	while (pos && pos->s_ns != ns)
+	while (pos && pos->s_ns && pos->s_ns != ns)
 		pos = pos->s_sibling;
 	return pos;
 }
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 90a899e..6c962ca 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -334,6 +334,8 @@ int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const void *ns, const cha
 	sysfs_addrm_start(&acxt, dir_sd);
 
 	sd = sysfs_find_dirent(dir_sd, ns, name);
+	if (sd && (sd->s_ns != ns))
+		sd = NULL;
 	if (sd)
 		sysfs_remove_one(&acxt, sd);
 
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 5/6] sysfs: Implement sysfs_delete_link
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (3 preceding siblings ...)
  2010-03-30 18:31 ` [PATCH 4/6] sysfs: Add support for tagged directories with untagged members Eric W. Biederman
@ 2010-03-30 18:31 ` Eric W. Biederman
  2010-04-29 20:29   ` patch sysfs-implement-sysfs_delete_link.patch added to gregkh-2.6 tree gregkh
  2010-03-30 18:31 ` [PATCH 6/6] driver core: Implement ns directory support for device classes Eric W. Biederman
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 18:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Eric W. Biederman, Benjamin Thery, Daniel Lezcano

From: Eric W. Biederman <ebiederm@xmission.com>

When removing a symlink sysfs_remove_link does not provide
enough information to figure out which tagged directory the symlink
falls in.  So I need sysfs_delete_link which is passed the target
of the symlink to delete.

sysfs_rename_link is updated to call sysfs_delete_link instead
of sysfs_remove_link as we have all of the information necessary
and the callers are interesting.

Both of these functions now have enough information to find a symlink
in a tagged directory.  The only restriction is that they must be called
before the target kobject is renamed or deleted.  If they are called
later I loose track of which tag the target kobject was marked with
and can no longer find the old symlink to remove it.

This patch was split from an earlier patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
---
 fs/sysfs/symlink.c    |   20 ++++++++++++++++++++
 include/linux/sysfs.h |    8 ++++++++
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/sysfs/symlink.c b/fs/sysfs/symlink.c
index 56ccdc6..cd11bf4 100644
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -108,6 +108,26 @@ int sysfs_create_link_nowarn(struct kobject *kobj, struct kobject *target,
 }
 
 /**
+ *	sysfs_delete_link - remove symlink in object's directory.
+ *	@kobj:	object we're acting for.
+ *	@targ:	object we're pointing to.
+ *	@name:	name of the symlink to remove.
+ *
+ *	Unlike sysfs_remove_link sysfs_delete_link has enough information
+ *	to successfully delete symlinks in tagged directories.
+ */
+void sysfs_delete_link(struct kobject *kobj, struct kobject *targ,
+			const char *name)
+{
+	const void *ns = NULL;
+	spin_lock(&sysfs_assoc_lock);
+	if (targ->sd)
+		ns = targ->sd->s_ns;
+	spin_unlock(&sysfs_assoc_lock);
+	sysfs_hash_and_remove(kobj->sd, ns, name);
+}
+
+/**
  *	sysfs_remove_link - remove symlink in object's directory.
  *	@kobj:	object we're acting for.
  *	@name:	name of the symlink to remove.
diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h
index 1885d21..976c466 100644
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -155,6 +155,9 @@ void sysfs_remove_link(struct kobject *kobj, const char *name);
 int sysfs_rename_link(struct kobject *kobj, struct kobject *target,
 			const char *old_name, const char *new_name);
 
+void sysfs_delete_link(struct kobject *dir, struct kobject *targ,
+			const char *name);
+
 int __must_check sysfs_create_group(struct kobject *kobj,
 				    const struct attribute_group *grp);
 int sysfs_update_group(struct kobject *kobj,
@@ -269,6 +272,11 @@ static inline int sysfs_rename_link(struct kobject *k, struct kobject *t,
 	return 0;
 }
 
+static inline void sysfs_delete_link(struct kobject *k, struct kobject *t,
+				     const char *name)
+{
+}
+
 static inline int sysfs_create_group(struct kobject *kobj,
 				     const struct attribute_group *grp)
 {
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 6/6] driver core: Implement ns directory support for device classes.
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (4 preceding siblings ...)
  2010-03-30 18:31 ` [PATCH 5/6] sysfs: Implement sysfs_delete_link Eric W. Biederman
@ 2010-03-30 18:31 ` Eric W. Biederman
  2010-04-29 20:29   ` patch driver-core-implement-ns-directory-support-for-device-classes.patch added to gregkh-2.6 tree gregkh
  2010-03-30 18:53 ` [PATCH 0/6] tagged sysfs support Kay Sievers
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 18:31 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Eric W. Biederman, Benjamin Thery

From: Eric W. Biederman <ebiederm@xmission.com>

device_del and device_rename were modified to use
sysfs_delete_link and sysfs_rename_link respectively to ensure
when these operations happen on devices whose classes
are in namespace directories they work properly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
---
 drivers/base/core.c |   21 ++++++++++++---------
 1 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 6b32f6e..73c352d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -787,7 +787,7 @@ out_device:
 out_busid:
 	if (dev->kobj.parent != &dev->class->p->class_subsys.kobj &&
 	    device_is_not_partition(dev))
-		sysfs_remove_link(&dev->class->p->class_subsys.kobj,
+		sysfs_delete_link(&dev->class->p->class_subsys.kobj, &dev->kobj,
 				  dev_name(dev));
 #else
 	/* link in the class directory pointing to the device */
@@ -805,7 +805,7 @@ out_busid:
 	return 0;
 
 out_busid:
-	sysfs_remove_link(&dev->class->p->class_subsys.kobj, dev_name(dev));
+	sysfs_delete_link(&dev->class->p->class_subsys.kobj, &dev->kobj, dev_name(dev));
 #endif
 
 out_subsys:
@@ -833,13 +833,13 @@ static void device_remove_class_symlinks(struct device *dev)
 
 	if (dev->kobj.parent != &dev->class->p->class_subsys.kobj &&
 	    device_is_not_partition(dev))
-		sysfs_remove_link(&dev->class->p->class_subsys.kobj,
+		sysfs_delete_link(&dev->class->p->class_subsys.kobj, &dev->kobj,
 				  dev_name(dev));
 #else
 	if (dev->parent && device_is_not_partition(dev))
 		sysfs_remove_link(&dev->kobj, "device");
 
-	sysfs_remove_link(&dev->class->p->class_subsys.kobj, dev_name(dev));
+	sysfs_delete_link(&dev->class->p->class_subsys.kobj, &dev->kobj, dev_name(dev));
 #endif
 
 	sysfs_remove_link(&dev->kobj, "subsystem");
@@ -1619,6 +1619,14 @@ int device_rename(struct device *dev, char *new_name)
 		goto out;
 	}
 
+#ifndef CONFIG_SYSFS_DEPRECATED
+	if (dev->class) {
+		error = sysfs_rename_link(&dev->class->p->class_subsys.kobj,
+			&dev->kobj, old_device_name, new_name);
+		if (error)
+			goto out;
+	}
+#endif
 	error = kobject_rename(&dev->kobj, new_name);
 	if (error)
 		goto out;
@@ -1633,11 +1641,6 @@ int device_rename(struct device *dev, char *new_name)
 						  new_class_name);
 		}
 	}
-#else
-	if (dev->class) {
-		error = sysfs_rename_link(&dev->class->p->class_subsys.kobj,
-					  &dev->kobj, old_device_name, new_name);
-	}
 #endif
 
 out:
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (5 preceding siblings ...)
  2010-03-30 18:31 ` [PATCH 6/6] driver core: Implement ns directory support for device classes Eric W. Biederman
@ 2010-03-30 18:53 ` Kay Sievers
  2010-03-30 23:04   ` Eric W. Biederman
  2010-03-31 17:21 ` Serge E. Hallyn
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Kay Sievers @ 2010-03-30 18:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Greg KH, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	Serge Hallyn, netdev

On Tue, Mar 30, 2010 at 20:30, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> The main short coming of using multiple network namespaces today
> is that only network devices for the primary network namespaces
> can be put in the kobject layer and sysfs.
>
> This is essentially the earlier version of this patchset that was
> reviewed before, just now on top of a version of sysfs that doesn't
> need cleanup patches to support it.

Just to check if we are not in conflict with planned changes, and how
to possibly handle them:

There is the plan and ongoing work to unify classes and buses, export
them at /sys/subsystem in the same layout of the current /sys/bus/.
The decision to export buses and classes as two different things
(which they aren't) is the last major piece in the sysfs layout which
needs to be fixed.

It would mean that /sys/subsystem/net/devices/* would look like
/sys/class/net/* today. But at the /sys/subsystem/net/ directory could
be global network-subsystem-wide control files which would need to be
namespaced too. (The network subsystem does not use subsytem-global
files today, but a bunch of other classes do.)

This could be modeled into the current way of doing sysfs namespaces?
A /sys/bus/<subsystem>/ directory hierarchy would need to be
namespaced, not just a single plain directory with symlinks. Would
that work?

Thanks,
Kay

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
@ 2010-03-30 19:23   ` Eric Dumazet
  2010-03-30 23:50     ` [PATCH 7/6] sysfs: Remove double free sysfs_get_sb Eric W. Biederman
  2010-03-31  5:01   ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Serge E. Hallyn
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 83+ messages in thread
From: Eric Dumazet @ 2010-03-30 19:23 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Benjamin LaHaise, Serge Hallyn,
	netdev

Le mardi 30 mars 2010 à 11:31 -0700, Eric W. Biederman a écrit :
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Add all of the necessary bioler plate to support
> multiple superblocks in sysfs.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  fs/sysfs/mount.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/sysfs/sysfs.h |    3 ++
>  2 files changed, 59 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 0cb1088..6a433ac 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -71,16 +71,70 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
>  	return 0;
>  }
>  
> +static int sysfs_test_super(struct super_block *sb, void *data)
> +{
> +	struct sysfs_super_info *sb_info = sysfs_info(sb);
> +	struct sysfs_super_info *info = data;
> +	int found = 1;
> +	return found;
> +}
> +
> +static int sysfs_set_super(struct super_block *sb, void *data)
> +{
> +	int error;
> +	error = set_anon_super(sb, data);
> +	if (!error)
> +		sb->s_fs_info = data;
> +	return error;
> +}
> +
>  static int sysfs_get_sb(struct file_system_type *fs_type,
>  	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
>  {
> -	return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
> +	struct sysfs_super_info *info;
> +	struct super_block *sb;
> +	int error;
> +
> +	error = -ENOMEM;
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info)
> +		goto out;
> +	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
> +	if (IS_ERR(sb) || sb->s_fs_info != info)
> +		kfree(info);
> +	if (IS_ERR(sb)) {
> +		kfree(info);

double kfree(info) ?

> +		error = PTR_ERR(sb);
> +		goto out;
> +	}
> +



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-30 18:53 ` [PATCH 0/6] tagged sysfs support Kay Sievers
@ 2010-03-30 23:04   ` Eric W. Biederman
  2010-03-31  5:51     ` Kay Sievers
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 23:04 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg Kroah-Hartman, Greg KH, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	Serge Hallyn, netdev

Kay Sievers <kay.sievers@vrfy.org> writes:

> On Tue, Mar 30, 2010 at 20:30, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>
>> The main short coming of using multiple network namespaces today
>> is that only network devices for the primary network namespaces
>> can be put in the kobject layer and sysfs.
>>
>> This is essentially the earlier version of this patchset that was
>> reviewed before, just now on top of a version of sysfs that doesn't
>> need cleanup patches to support it.
>
> Just to check if we are not in conflict with planned changes, and how
> to possibly handle them:
>
> There is the plan and ongoing work to unify classes and buses, export
> them at /sys/subsystem in the same layout of the current /sys/bus/.
> The decision to export buses and classes as two different things
> (which they aren't) is the last major piece in the sysfs layout which
> needs to be fixed.

Interesting.  We will symlinks ie:
/sys/class -> /sys/subsystem
/sys/bus -> /sys/subsystem
to keep from breaking userspace.

> It would mean that /sys/subsystem/net/devices/* would look like
> /sys/class/net/* today. But at the /sys/subsystem/net/ directory could
> be global network-subsystem-wide control files which would need to be
> namespaced too. (The network subsystem does not use subsytem-global
> files today, but a bunch of other classes do.)
>
> This could be modeled into the current way of doing sysfs namespaces?
> A /sys/bus/<subsystem>/ directory hierarchy would need to be
> namespaced, not just a single plain directory with symlinks. Would
> that work?

I'm not entirely clear on what you are doing but it all sounds like it
will fit within what I am doing.  Right now I have /sys/class/net,
/sys/devices/virtual/net and a bunch of other net directories becoming
tagged and only showing up in the appropriately mounted sysfs.  We
track them all in the class kset and as long as we extend that capability
when the subsystem change happens in sysfs all should be well.

Today we have /sys/class/net/bonding_master.  For now I have that as
an untagged but the implementation is aware of which network namespace
your current process is in.  Thinking about that a little more it
would be better to make that file tagged so that userspace can see
different versions for the different network namespaces.  Joy.

I expect other control files will be the same.

In general it doesn't make sense to add control files for networking.
as they easily conflict with legal network device names and thus create
the possibility of breaking someones userspace.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 7/6] sysfs: Remove double free sysfs_get_sb
  2010-03-30 19:23   ` Eric Dumazet
@ 2010-03-30 23:50     ` Eric W. Biederman
  0 siblings, 0 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-30 23:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Benjamin LaHaise, Serge Hallyn,
	netdev


Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
 fs/sysfs/mount.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index 13bd0ba..bbba090 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -121,7 +121,6 @@ static int sysfs_get_sb(struct file_system_type *fs_type,
 	if (IS_ERR(sb) || sb->s_fs_info != info)
 		kfree(info);
 	if (IS_ERR(sb)) {
-		kfree(info);
 		error = PTR_ERR(sb);
 		goto out;
 	}
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-30 18:31 ` [PATCH 3/6] sysfs: Implement sysfs tagged directory support Eric W. Biederman
@ 2010-03-31  2:43   ` Serge E. Hallyn
  2010-03-31  3:38     ` Eric W. Biederman
  2010-03-31  6:49   ` Tejun Heo
  2010-04-29 20:29   ` patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree gregkh
  2 siblings, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-03-31  2:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev, Benjamin Thery

Quoting Eric W. Biederman (ebiederm@xmission.com):
>  int sysfs_rename(struct sysfs_dirent *sd,
> -	struct sysfs_dirent *new_parent_sd, const char *new_name)
> +	struct sysfs_dirent *new_parent_sd, const void *new_ns,
> +	const char *new_name)
>  {
>  	const char *dup_name = NULL;
>  	int error;
> @@ -743,12 +789,12 @@ int sysfs_rename(struct sysfs_dirent *sd,
>  	mutex_lock(&sysfs_mutex);
> 
>  	error = 0;
> -	if ((sd->s_parent == new_parent_sd) &&
> +	if ((sd->s_parent == new_parent_sd) && (sd->s_ns == new_ns) &&
>  	    (strcmp(sd->s_name, new_name) == 0))
>  		goto out;	/* nothing to rename */
> 
>  	error = -EEXIST;
> -	if (sysfs_find_dirent(new_parent_sd, new_name))
> +	if (sysfs_find_dirent(new_parent_sd, new_ns, new_name))
>  		goto out;
> 
>  	/* rename sysfs_dirent */
> @@ -770,6 +816,7 @@ int sysfs_rename(struct sysfs_dirent *sd,
>  		sd->s_parent = new_parent_sd;
>  		sysfs_link_sibling(sd);
>  	}
> +	sd->s_ns = new_ns;
> 
>  	error = 0;
>   out:

...

> +void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
> +{
> +	struct super_block *sb;
> +
> +	mutex_lock(&sysfs_mutex);
> +	spin_lock(&sb_lock);
> +	list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
> +		struct sysfs_super_info *info = sysfs_info(sb);
> +		/* Ignore superblocks that are in the process of unmounting */
> +		if (sb->s_count <= S_BIAS)
> +			continue;
> +		/* Ignore superblocks with the wrong ns */
> +		if (info->ns[type] != ns)
> +			continue;
> +		info->ns[type] = NULL;
> +	}
> +	spin_unlock(&sb_lock);
> +	mutex_unlock(&sysfs_mutex);
> +}
> +

..

> @@ -136,6 +138,7 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>  			const char *old, const char *new)
>  {
>  	struct sysfs_dirent *parent_sd, *sd = NULL;
> +	const void *old_ns = NULL, *new_ns = NULL;
>  	int result;
> 
>  	if (!kobj)
> @@ -143,8 +146,11 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>  	else
>  		parent_sd = kobj->sd;
> 
> +	if (targ->sd)
> +		old_ns = targ->sd->s_ns;
> +
>  	result = -ENOENT;
> -	sd = sysfs_get_dirent(parent_sd, old);
> +	sd = sysfs_get_dirent(parent_sd, old_ns, old);
>  	if (!sd)
>  		goto out;
> 
> @@ -154,7 +160,10 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>  	if (sd->s_symlink.target_sd->s_dir.kobj != targ)
>  		goto out;
> 
> -	result = sysfs_rename(sd, parent_sd, new);
> +	if (sysfs_ns_type(parent_sd))
> +		new_ns = targ->ktype->namespace(targ);
> +
> +	result = sysfs_rename(sd, parent_sd, new_ns, new);
> 
>  out:
>  	sysfs_put(sd);

This is a huge patch, and for the most part I haven't found any problems,
except potentially this one.  It looks like sysfs_rename_link() checks
old_ns and new_ns before calling sysfs_rename().  But sysfs_mutex isn't
taken until sysfs_rename().  sysfs_rename() will then proceed to do
the rename, and unconditionally set sd->ns = new_ns.

In the meantime, it seems as though new_ns might have exited, and
sysfs_exit_ns() unset new_ns on the new parent dir.  This means that
we'll end up with the namespace code having thought that it cleared
all new_ns's, but this file will have snuck by.  Meaning an action on
the renamed file might dereference a freed namespace.

Or am I way off base?

-serge

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  2:43   ` Serge E. Hallyn
@ 2010-03-31  3:38     ` Eric W. Biederman
  2010-03-31  4:02       ` Serge E. Hallyn
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-31  3:38 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev, Benjamin Thery

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>>  int sysfs_rename(struct sysfs_dirent *sd,
>> -	struct sysfs_dirent *new_parent_sd, const char *new_name)
>> +	struct sysfs_dirent *new_parent_sd, const void *new_ns,
>> +	const char *new_name)
>>  {
>>  	const char *dup_name = NULL;
>>  	int error;
>> @@ -743,12 +789,12 @@ int sysfs_rename(struct sysfs_dirent *sd,
>>  	mutex_lock(&sysfs_mutex);
>> 
>>  	error = 0;
>> -	if ((sd->s_parent == new_parent_sd) &&
>> +	if ((sd->s_parent == new_parent_sd) && (sd->s_ns == new_ns) &&
>>  	    (strcmp(sd->s_name, new_name) == 0))
>>  		goto out;	/* nothing to rename */
>> 
>>  	error = -EEXIST;
>> -	if (sysfs_find_dirent(new_parent_sd, new_name))
>> +	if (sysfs_find_dirent(new_parent_sd, new_ns, new_name))
>>  		goto out;
>> 
>>  	/* rename sysfs_dirent */
>> @@ -770,6 +816,7 @@ int sysfs_rename(struct sysfs_dirent *sd,
>>  		sd->s_parent = new_parent_sd;
>>  		sysfs_link_sibling(sd);
>>  	}
>> +	sd->s_ns = new_ns;
>> 
>>  	error = 0;
>>   out:
>
> ...
>
>> +void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
>> +{
>> +	struct super_block *sb;
>> +
>> +	mutex_lock(&sysfs_mutex);
>> +	spin_lock(&sb_lock);
>> +	list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
>> +		struct sysfs_super_info *info = sysfs_info(sb);
>> +		/* Ignore superblocks that are in the process of unmounting */
>> +		if (sb->s_count <= S_BIAS)
>> +			continue;
>> +		/* Ignore superblocks with the wrong ns */
>> +		if (info->ns[type] != ns)
>> +			continue;
>> +		info->ns[type] = NULL;
>> +	}
>> +	spin_unlock(&sb_lock);
>> +	mutex_unlock(&sysfs_mutex);
>> +}
>> +
>
> ..
>
>> @@ -136,6 +138,7 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>>  			const char *old, const char *new)
>>  {
>>  	struct sysfs_dirent *parent_sd, *sd = NULL;
>> +	const void *old_ns = NULL, *new_ns = NULL;
>>  	int result;
>> 
>>  	if (!kobj)
>> @@ -143,8 +146,11 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>>  	else
>>  		parent_sd = kobj->sd;
>> 
>> +	if (targ->sd)
>> +		old_ns = targ->sd->s_ns;
>> +
>>  	result = -ENOENT;
>> -	sd = sysfs_get_dirent(parent_sd, old);
>> +	sd = sysfs_get_dirent(parent_sd, old_ns, old);
>>  	if (!sd)
>>  		goto out;
>> 
>> @@ -154,7 +160,10 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
>>  	if (sd->s_symlink.target_sd->s_dir.kobj != targ)
>>  		goto out;
>> 
>> -	result = sysfs_rename(sd, parent_sd, new);
>> +	if (sysfs_ns_type(parent_sd))
>> +		new_ns = targ->ktype->namespace(targ);
>> +
>> +	result = sysfs_rename(sd, parent_sd, new_ns, new);
>> 
>>  out:
>>  	sysfs_put(sd);
>
> This is a huge patch, and for the most part I haven't found any problems,
> except potentially this one.  It looks like sysfs_rename_link() checks
> old_ns and new_ns before calling sysfs_rename().  But sysfs_mutex isn't
> taken until sysfs_rename().  sysfs_rename() will then proceed to do
> the rename, and unconditionally set sd->ns = new_ns.
>
> In the meantime, it seems as though new_ns might have exited, and
> sysfs_exit_ns() unset new_ns on the new parent dir.  This means that
> we'll end up with the namespace code having thought that it cleared
> all new_ns's, but this file will have snuck by.  Meaning an action on
> the renamed file might dereference a freed namespace.
>
> Or am I way off base?

There are a couple of reasons why this is not a concern.

The only new_ns we clear is on the super block.

sysfs itself never dereferences namespace arguments and only uses them
for comparison purposes.  They are just cookies that cause comparisons
to differ from a sysfs perspective.

The upper levels are responsible for taking care of them selves
sysfs_mutex does not protect them.  If you compile out sysfs the sysfs
mutex is not even present.

In the worst case if the upper levels mess up we will have a stale
token that we never dereference on a sysfs dirent, which in a pathological
case will happen to be the same as a new namespace and we will have
a spurious directory entry that we have leaked.

In practice we move all network devices (and thus sysfs files) out of
a network namespace before allowing it to exit.  The network namespace
is not listed so it is invisible to anyone wanting to poke a network
device into an exiting network namespace.  The unlisting of the
network namespace and the device_rename both happen under the
rtnl_lock which guarantees they are serialized.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  3:38     ` Eric W. Biederman
@ 2010-03-31  4:02       ` Serge E. Hallyn
  2010-03-31  4:23         ` Eric W. Biederman
  0 siblings, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-03-31  4:02 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev, Benjamin Thery

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serue@us.ibm.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >>  int sysfs_rename(struct sysfs_dirent *sd,
> >> -	struct sysfs_dirent *new_parent_sd, const char *new_name)
> >> +	struct sysfs_dirent *new_parent_sd, const void *new_ns,
> >> +	const char *new_name)
> >>  {
> >>  	const char *dup_name = NULL;
> >>  	int error;
> >> @@ -743,12 +789,12 @@ int sysfs_rename(struct sysfs_dirent *sd,
> >>  	mutex_lock(&sysfs_mutex);
> >> 
> >>  	error = 0;
> >> -	if ((sd->s_parent == new_parent_sd) &&
> >> +	if ((sd->s_parent == new_parent_sd) && (sd->s_ns == new_ns) &&
> >>  	    (strcmp(sd->s_name, new_name) == 0))
> >>  		goto out;	/* nothing to rename */
> >> 
> >>  	error = -EEXIST;
> >> -	if (sysfs_find_dirent(new_parent_sd, new_name))
> >> +	if (sysfs_find_dirent(new_parent_sd, new_ns, new_name))
> >>  		goto out;
> >> 
> >>  	/* rename sysfs_dirent */
> >> @@ -770,6 +816,7 @@ int sysfs_rename(struct sysfs_dirent *sd,
> >>  		sd->s_parent = new_parent_sd;
> >>  		sysfs_link_sibling(sd);
> >>  	}
> >> +	sd->s_ns = new_ns;
> >> 
> >>  	error = 0;
> >>   out:
> >
> > ...
> >
> >> +void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
> >> +{
> >> +	struct super_block *sb;
> >> +
> >> +	mutex_lock(&sysfs_mutex);
> >> +	spin_lock(&sb_lock);
> >> +	list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
> >> +		struct sysfs_super_info *info = sysfs_info(sb);
> >> +		/* Ignore superblocks that are in the process of unmounting */
> >> +		if (sb->s_count <= S_BIAS)
> >> +			continue;
> >> +		/* Ignore superblocks with the wrong ns */
> >> +		if (info->ns[type] != ns)
> >> +			continue;
> >> +		info->ns[type] = NULL;
> >> +	}
> >> +	spin_unlock(&sb_lock);
> >> +	mutex_unlock(&sysfs_mutex);
> >> +}
> >> +
> >
> > ..
> >
> >> @@ -136,6 +138,7 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >>  			const char *old, const char *new)
> >>  {
> >>  	struct sysfs_dirent *parent_sd, *sd = NULL;
> >> +	const void *old_ns = NULL, *new_ns = NULL;
> >>  	int result;
> >> 
> >>  	if (!kobj)
> >> @@ -143,8 +146,11 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >>  	else
> >>  		parent_sd = kobj->sd;
> >> 
> >> +	if (targ->sd)
> >> +		old_ns = targ->sd->s_ns;
> >> +
> >>  	result = -ENOENT;
> >> -	sd = sysfs_get_dirent(parent_sd, old);
> >> +	sd = sysfs_get_dirent(parent_sd, old_ns, old);
> >>  	if (!sd)
> >>  		goto out;
> >> 
> >> @@ -154,7 +160,10 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >>  	if (sd->s_symlink.target_sd->s_dir.kobj != targ)
> >>  		goto out;
> >> 
> >> -	result = sysfs_rename(sd, parent_sd, new);
> >> +	if (sysfs_ns_type(parent_sd))
> >> +		new_ns = targ->ktype->namespace(targ);
> >> +
> >> +	result = sysfs_rename(sd, parent_sd, new_ns, new);
> >> 
> >>  out:
> >>  	sysfs_put(sd);
> >
> > This is a huge patch, and for the most part I haven't found any problems,
> > except potentially this one.  It looks like sysfs_rename_link() checks
> > old_ns and new_ns before calling sysfs_rename().  But sysfs_mutex isn't
> > taken until sysfs_rename().  sysfs_rename() will then proceed to do
> > the rename, and unconditionally set sd->ns = new_ns.
> >
> > In the meantime, it seems as though new_ns might have exited, and
> > sysfs_exit_ns() unset new_ns on the new parent dir.  This means that
> > we'll end up with the namespace code having thought that it cleared
> > all new_ns's, but this file will have snuck by.  Meaning an action on
> > the renamed file might dereference a freed namespace.
> >
> > Or am I way off base?
> 
> There are a couple of reasons why this is not a concern.
> 
> The only new_ns we clear is on the super block.

Oops, yeah - I failed to note that.

> sysfs itself never dereferences namespace arguments and only uses them
> for comparison purposes.  They are just cookies that cause comparisons
> to differ from a sysfs perspective.
> 
> The upper levels are responsible for taking care of them selves
> sysfs_mutex does not protect them.  If you compile out sysfs the sysfs
> mutex is not even present.
> 
> In the worst case if the upper levels mess up we will have a stale
> token that we never dereference on a sysfs dirent, which in a pathological
> case will happen to be the same as a new namespace and we will have
> a spurious directory entry that we have leaked.
> 
> In practice we move all network devices (and thus sysfs files) out of
> a network namespace before allowing it to exit.

Ok, that makes sense too - so any tagged sysfs file created for some object
in a ns must be deleted at netns exit.  I could imagine someone expecting
that if the ns exits, the tasks in the ns will exit, causing the sysfs
mount to be umounted and auto-deleting the files?  (which of course would
get buggered if task in other ns was examining the mount which it got
through mounts propagation)  We'll have to make sure noone does that.  Should
it be documented somewhere, or is that obvious enough?

(I'm thinking of other namespaces in the future, not net_ns which I
understand doesn't do that)

> The network namespace
> is not listed so it is invisible to anyone wanting to poke a network
> device into an exiting network namespace.  The unlisting of the
> network namespace and the device_rename both happen under the
> rtnl_lock which guarantees they are serialized.
> 
> Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  4:02       ` Serge E. Hallyn
@ 2010-03-31  4:23         ` Eric W. Biederman
  2010-03-31  4:53           ` Serge E. Hallyn
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-31  4:23 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev, Benjamin Thery

"Serge E. Hallyn" <serue@us.ibm.com> writes:

>> > This is a huge patch, and for the most part I haven't found any problems,
>> > except potentially this one.  It looks like sysfs_rename_link() checks
>> > old_ns and new_ns before calling sysfs_rename().  But sysfs_mutex isn't
>> > taken until sysfs_rename().  sysfs_rename() will then proceed to do
>> > the rename, and unconditionally set sd->ns = new_ns.
>> >
>> > In the meantime, it seems as though new_ns might have exited, and
>> > sysfs_exit_ns() unset new_ns on the new parent dir.  This means that
>> > we'll end up with the namespace code having thought that it cleared
>> > all new_ns's, but this file will have snuck by.  Meaning an action on
>> > the renamed file might dereference a freed namespace.
>> >
>> > Or am I way off base?
>> 
>> There are a couple of reasons why this is not a concern.
>> 
>> The only new_ns we clear is on the super block.
>
> Oops, yeah - I failed to note that.
>
>> sysfs itself never dereferences namespace arguments and only uses them
>> for comparison purposes.  They are just cookies that cause comparisons
>> to differ from a sysfs perspective.
>> 
>> The upper levels are responsible for taking care of them selves
>> sysfs_mutex does not protect them.  If you compile out sysfs the sysfs
>> mutex is not even present.
>> 
>> In the worst case if the upper levels mess up we will have a stale
>> token that we never dereference on a sysfs dirent, which in a pathological
>> case will happen to be the same as a new namespace and we will have
>> a spurious directory entry that we have leaked.
>> 
>> In practice we move all network devices (and thus sysfs files) out of
>> a network namespace before allowing it to exit.
>
> Ok, that makes sense too - so any tagged sysfs file created for some object
> in a ns must be deleted at netns exit.  I could imagine someone expecting
> that if the ns exits, the tasks in the ns will exit, causing the sysfs
> mount to be umounted and auto-deleting the files?  (which of course would
> get buggered if task in other ns was examining the mount which it got
> through mounts propagation)  We'll have to make sure noone does that.  Should
> it be documented somewhere, or is that obvious enough?

In general it is simply true.  An object in a namespace either keeps
the namespace alive, or it is destroyed when the namespace exits
because the object is unreachable.

So the only possible problem I can think of is of ordering the object
destruction and calling sysfs_exit_ns.    So for the moment I am going
to vote that this is simply obvious enough not to worry about in detail.

It is also pretty obvious if you trace the code and ask how does sysfs
dirent X get destroyed.

Today there is just a wee bit of automatic file destruction at the sysfs
level.    The device layer does not take advantage of it, and in hierarchical
situation it leads to bugs.  So even I think if we document anything it
should be that sysfs can not safely automatically delete anything, for
you.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  4:23         ` Eric W. Biederman
@ 2010-03-31  4:53           ` Serge E. Hallyn
  0 siblings, 0 replies; 83+ messages in thread
From: Serge E. Hallyn @ 2010-03-31  4:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev, Benjamin Thery

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serue@us.ibm.com> writes:
> 
> >> > This is a huge patch, and for the most part I haven't found any problems,
> >> > except potentially this one.  It looks like sysfs_rename_link() checks
> >> > old_ns and new_ns before calling sysfs_rename().  But sysfs_mutex isn't
> >> > taken until sysfs_rename().  sysfs_rename() will then proceed to do
> >> > the rename, and unconditionally set sd->ns = new_ns.
> >> >
> >> > In the meantime, it seems as though new_ns might have exited, and
> >> > sysfs_exit_ns() unset new_ns on the new parent dir.  This means that
> >> > we'll end up with the namespace code having thought that it cleared
> >> > all new_ns's, but this file will have snuck by.  Meaning an action on
> >> > the renamed file might dereference a freed namespace.
> >> >
> >> > Or am I way off base?
> >> 
> >> There are a couple of reasons why this is not a concern.
> >> 
> >> The only new_ns we clear is on the super block.
> >
> > Oops, yeah - I failed to note that.
> >
> >> sysfs itself never dereferences namespace arguments and only uses them
> >> for comparison purposes.  They are just cookies that cause comparisons
> >> to differ from a sysfs perspective.
> >> 
> >> The upper levels are responsible for taking care of them selves
> >> sysfs_mutex does not protect them.  If you compile out sysfs the sysfs
> >> mutex is not even present.
> >> 
> >> In the worst case if the upper levels mess up we will have a stale
> >> token that we never dereference on a sysfs dirent, which in a pathological
> >> case will happen to be the same as a new namespace and we will have
> >> a spurious directory entry that we have leaked.
> >> 
> >> In practice we move all network devices (and thus sysfs files) out of
> >> a network namespace before allowing it to exit.
> >
> > Ok, that makes sense too - so any tagged sysfs file created for some object
> > in a ns must be deleted at netns exit.  I could imagine someone expecting
> > that if the ns exits, the tasks in the ns will exit, causing the sysfs
> > mount to be umounted and auto-deleting the files?  (which of course would
> > get buggered if task in other ns was examining the mount which it got
> > through mounts propagation)  We'll have to make sure noone does that.  Should
> > it be documented somewhere, or is that obvious enough?
> 
> In general it is simply true.  An object in a namespace either keeps
> the namespace alive, or it is destroyed when the namespace exits
> because the object is unreachable.

I guess you'd hope so :)

> So the only possible problem I can think of is of ordering the object
> destruction and calling sysfs_exit_ns.    So for the moment I am going
> to vote that this is simply obvious enough not to worry about in detail.
> 
> It is also pretty obvious if you trace the code and ask how does sysfs
> dirent X get destroyed.
> 
> Today there is just a wee bit of automatic file destruction at the sysfs
> level.    The device layer does not take advantage of it, and in hierarchical
> situation it leads to bugs.  So even I think if we document anything it
> should be that sysfs can not safely automatically delete anything, for
> you.
> 
> Eric

Ok.  I'm convinced.

thanks,
-serge

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
  2010-03-30 19:23   ` Eric Dumazet
@ 2010-03-31  5:01   ` Serge E. Hallyn
  2010-03-31  5:01     ` Serge E. Hallyn
  2010-03-31  5:41   ` Tejun Heo
  2010-04-29 20:29   ` patch sysfs-basic-support-for-multiple-super-blocks.patch added to gregkh-2.6 tree gregkh
  3 siblings, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-03-31  5:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev

Quoting Eric W. Biederman (ebiederm@xmission.com):
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Add all of the necessary bioler plate to support
> multiple superblocks in sysfs.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Serge Hallyn <serue@us.ibm.com>

> ---
>  fs/sysfs/mount.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  fs/sysfs/sysfs.h |    3 ++
>  2 files changed, 59 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
> index 0cb1088..6a433ac 100644
> --- a/fs/sysfs/mount.c
> +++ b/fs/sysfs/mount.c
> @@ -71,16 +71,70 @@ static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
>  	return 0;
>  }
> 
> +static int sysfs_test_super(struct super_block *sb, void *data)
> +{
> +	struct sysfs_super_info *sb_info = sysfs_info(sb);
> +	struct sysfs_super_info *info = data;
> +	int found = 1;
> +	return found;
> +}
> +
> +static int sysfs_set_super(struct super_block *sb, void *data)
> +{
> +	int error;
> +	error = set_anon_super(sb, data);
> +	if (!error)
> +		sb->s_fs_info = data;
> +	return error;
> +}
> +
>  static int sysfs_get_sb(struct file_system_type *fs_type,
>  	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
>  {
> -	return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
> +	struct sysfs_super_info *info;
> +	struct super_block *sb;
> +	int error;
> +
> +	error = -ENOMEM;
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info)
> +		goto out;
> +	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
> +	if (IS_ERR(sb) || sb->s_fs_info != info)
> +		kfree(info);
> +	if (IS_ERR(sb)) {
> +		kfree(info);
> +		error = PTR_ERR(sb);
> +		goto out;
> +	}
> +	if (!sb->s_root) {
> +		sb->s_flags = flags;
> +		error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
> +		if (error) {
> +			deactivate_locked_super(sb);
> +			goto out;
> +		}
> +		sb->s_flags |= MS_ACTIVE;
> +	}
> +
> +	simple_set_mnt(mnt, sb);
> +	error = 0;
> +out:
> +	return error;
> +}
> +
> +static void sysfs_kill_sb(struct super_block *sb)
> +{
> +	struct sysfs_super_info *info = sysfs_info(sb);
> +
> +	kill_anon_super(sb);
> +	kfree(info);
>  }
> 
>  static struct file_system_type sysfs_fs_type = {
>  	.name		= "sysfs",
>  	.get_sb		= sysfs_get_sb,
> -	.kill_sb	= kill_anon_super,
> +	.kill_sb	= sysfs_kill_sb,
>  };
> 
>  int __init sysfs_init(void)
> diff --git a/fs/sysfs/sysfs.h b/fs/sysfs/sysfs.h
> index 30f5a44..030a39d 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
>  /*
>   * mount.c
>   */
> +struct sysfs_super_info {
> +};
> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
>  extern struct sysfs_dirent sysfs_root;
>  extern struct kmem_cache *sysfs_dir_cachep;
> 
> -- 
> 1.6.5.2.143.g8cc62

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-31  5:01   ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Serge E. Hallyn
@ 2010-03-31  5:01     ` Serge E. Hallyn
  0 siblings, 0 replies; 83+ messages in thread
From: Serge E. Hallyn @ 2010-03-31  5:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev

Quoting Serge E. Hallyn (serue@us.ibm.com):
> Quoting Eric W. Biederman (ebiederm@xmission.com):
> > From: Eric W. Biederman <ebiederm@xmission.com>
> > 
> > Add all of the necessary bioler plate to support
> > multiple superblocks in sysfs.
> > 
> > Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> 
> Acked-by: Serge Hallyn <serue@us.ibm.com>

(with the patch 7/6 of course :)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
  2010-03-30 19:23   ` Eric Dumazet
  2010-03-31  5:01   ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Serge E. Hallyn
@ 2010-03-31  5:41   ` Tejun Heo
  2010-03-31  5:51     ` Eric W. Biederman
  2010-04-29 20:29   ` patch sysfs-basic-support-for-multiple-super-blocks.patch added to gregkh-2.6 tree gregkh
  3 siblings, 1 reply; 83+ messages in thread
From: Tejun Heo @ 2010-03-31  5:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev

Hello, Eric.

On 03/31/2010 03:31 AM, Eric W. Biederman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Add all of the necessary bioler plate to support
                           boiler :-)

> +static int sysfs_test_super(struct super_block *sb, void *data)
> +{
> +	struct sysfs_super_info *sb_info = sysfs_info(sb);
> +	struct sysfs_super_info *info = data;
> +	int found = 1;
> +	return found;
> +}

Can you please make it return bool?

>  static int sysfs_get_sb(struct file_system_type *fs_type,
>  	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
>  {
> -	return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
> +	struct sysfs_super_info *info;
> +	struct super_block *sb;
> +	int error;
> +
> +	error = -ENOMEM;
> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
> +	if (!info)
> +		goto out;
> +	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
> +	if (IS_ERR(sb) || sb->s_fs_info != info)
> +		kfree(info);
> +	if (IS_ERR(sb)) {
> +		kfree(info);
> +		error = PTR_ERR(sb);
> +		goto out;
> +	}
> +	if (!sb->s_root) {
> +		sb->s_flags = flags;
> +		error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
> +		if (error) {
> +			deactivate_locked_super(sb);
> +			goto out;
> +		}
> +		sb->s_flags |= MS_ACTIVE;
> +	}
> +
> +	simple_set_mnt(mnt, sb);
> +	error = 0;
> +out:
> +	return error;
> +}

I haven't looked at later patches but I suppose this is gonna be
filled with more meaningful stuff later.  One (possibly silly) thing
that stands out compared to get_sb_single() is missing remount
handling.  Is it intended?

> index 30f5a44..030a39d 100644
> --- a/fs/sysfs/sysfs.h
> +++ b/fs/sysfs/sysfs.h
> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
>  /*
>   * mount.c
>   */
> +struct sysfs_super_info {
> +};
> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))

Another nit picking.  It would be better to wrap SB in the macro
definition.  Also, wouldn't an inline function be better?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-31  5:41   ` Tejun Heo
@ 2010-03-31  5:51     ` Eric W. Biederman
  2010-03-31 13:47       ` Serge E. Hallyn
  2010-04-05  7:45       ` Tejun Heo
  0 siblings, 2 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-31  5:51 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev

Tejun Heo <tj@kernel.org> writes:

> Hello, Eric.
>
> On 03/31/2010 03:31 AM, Eric W. Biederman wrote:
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> Add all of the necessary bioler plate to support
>                            boiler :-)
>
>> +static int sysfs_test_super(struct super_block *sb, void *data)
>> +{
>> +	struct sysfs_super_info *sb_info = sysfs_info(sb);
>> +	struct sysfs_super_info *info = data;
>> +	int found = 1;
>> +	return found;
>> +}
>
> Can you please make it return bool?

Nope.  That would mean I could not use it with sget.

>>  static int sysfs_get_sb(struct file_system_type *fs_type,
>>  	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
>>  {
>> -	return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
>> +	struct sysfs_super_info *info;
>> +	struct super_block *sb;
>> +	int error;
>> +
>> +	error = -ENOMEM;
>> +	info = kzalloc(sizeof(*info), GFP_KERNEL);
>> +	if (!info)
>> +		goto out;
>> +	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
>> +	if (IS_ERR(sb) || sb->s_fs_info != info)
>> +		kfree(info);
>> +	if (IS_ERR(sb)) {
>> +		kfree(info);
>> +		error = PTR_ERR(sb);
>> +		goto out;
>> +	}
>> +	if (!sb->s_root) {
>> +		sb->s_flags = flags;
>> +		error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
>> +		if (error) {
>> +			deactivate_locked_super(sb);
>> +			goto out;
>> +		}
>> +		sb->s_flags |= MS_ACTIVE;
>> +	}
>> +
>> +	simple_set_mnt(mnt, sb);
>> +	error = 0;
>> +out:
>> +	return error;
>> +}
>
> I haven't looked at later patches but I suppose this is gonna be
> filled with more meaningful stuff later. 

Yes it will.

> One (possibly silly) thing
> that stands out compared to get_sb_single() is missing remount
> handling.  Is it intended?

There is nothing for a remount to do so I ignore it.   The only
thing that would possibly be meaningful is a read-only mount,
and nothing I know of sysfs suggests read-only mounts of sysfs
work, or make any sense.

>> index 30f5a44..030a39d 100644
>> --- a/fs/sysfs/sysfs.h
>> +++ b/fs/sysfs/sysfs.h
>> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
>>  /*
>>   * mount.c
>>   */
>> +struct sysfs_super_info {
>> +};
>> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
>
> Another nit picking.  It would be better to wrap SB in the macro
> definition.  Also, wouldn't an inline function be better?

Good spotting.  That doesn't bite today but it will certainly bite
someday if it isn't fixed.

I wonder how that has slipped through the review all of this time.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-30 23:04   ` Eric W. Biederman
@ 2010-03-31  5:51     ` Kay Sievers
  2010-03-31  6:25       ` Tejun Heo
                         ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Kay Sievers @ 2010-03-31  5:51 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Greg KH, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	Serge Hallyn, netdev

On Wed, Mar 31, 2010 at 01:04, Eric W. Biederman <ebiederm@xmission.com> wrote:
> Kay Sievers <kay.sievers@vrfy.org> writes:
>> On Tue, Mar 30, 2010 at 20:30, Eric W. Biederman <ebiederm@xmission.com> wrote:
>>>
>>> The main short coming of using multiple network namespaces today
>>> is that only network devices for the primary network namespaces
>>> can be put in the kobject layer and sysfs.
>>>
>>> This is essentially the earlier version of this patchset that was
>>> reviewed before, just now on top of a version of sysfs that doesn't
>>> need cleanup patches to support it.
>>
>> Just to check if we are not in conflict with planned changes, and how
>> to possibly handle them:
>>
>> There is the plan and ongoing work to unify classes and buses, export
>> them at /sys/subsystem in the same layout of the current /sys/bus/.
>> The decision to export buses and classes as two different things
>> (which they aren't) is the last major piece in the sysfs layout which
>> needs to be fixed.
>
> Interesting.  We will symlinks ie:
> /sys/class -> /sys/subsystem
> /sys/bus -> /sys/subsystem
> to keep from breaking userspace.

Yeah, /sys/bus/, which is the only sane layout of the needlessly
different 3 versions of the same thing (bus, class, block).

/sys/bus/<subsys> can just be a plain symlinks to the
/sys/subsystem/<subsys> directories.

/sys/class/<subsys> *could* be a symlink to the
/sys/subsystem/<subsys>/devices/ directory, but we really don't want
to continue to stupidly mix subsystem-wide control files with device
lists anymore. The "devices" directory needs to be a strict list of
devices, not some collection of random stuff, that it is today. :)

So we either leave all the conceptually broken class attributes behind
us, and put them at the  /sys/subsystem/<subsys>/ level only, or we
need to create the /sys/class/<subsys>/* stuff all as symlinks like we
do today. I expect, we have to create /sys/class as we do today.

Another problem to solve is that sysfs does not allow us to symlink
regular files, only directories, so we can currently not create the
class-wide attributes as symlinks to the proper file in
/sys/subsystem/.

>> It would mean that /sys/subsystem/net/devices/* would look like
>> /sys/class/net/* today. But at the /sys/subsystem/net/ directory could
>> be global network-subsystem-wide control files which would need to be
>> namespaced too. (The network subsystem does not use subsytem-global
>> files today, but a bunch of other classes do.)
>>
>> This could be modeled into the current way of doing sysfs namespaces?
>> A /sys/bus/<subsystem>/ directory hierarchy would need to be
>> namespaced, not just a single plain directory with symlinks. Would
>> that work?
>
> I'm not entirely clear on what you are doing but it all sounds like it
> will fit within what I am doing.

The goal is to unify the 3 needlessly different versions of "device
lists of the same subsystem". We have /sys/class, /sys/bus,
/sys/block, and all of them will be unified at /sys/subsystem/ leaving
the old names as compat links only. Unlike block and class, the
/sys/subsystem/<subsys> directory can be extended with custom
subdirectories and files, without mixing random files into device
lists.

With /sys/subsystem/, userspace can uniquely identify and find all
devices at /sys/<subsys>/devices/<device-name>/ with only the
subsystem and the device name.

All devices in /sys/devices already have a symlink called "subsystem"
which will point back to the corresponding /sys/subsystem/<subsys>
directory, and the event environment already contains a variable
SUBSYSTEM with the name.

That would be the first time sysfs device interfaces have some idea of
consistency. :)

> Right now I have /sys/class/net,
> /sys/devices/virtual/net and a bunch of other net directories becoming
> tagged and only showing up in the appropriately mounted sysfs.  We
> track them all in the class kset and as long as we extend that capability
> when the subsystem change happens in sysfs all should be well.

Ok, sounds good.

> Today we have /sys/class/net/bonding_master.  For now I have that as
> an untagged but the implementation is aware of which network namespace
> your current process is in.  Thinking about that a little more it
> would be better to make that file tagged so that userspace can see
> different versions for the different network namespaces.  Joy.

Yeah, that might make more sense in the end.

> I expect other control files will be the same.

Sounds like, yes.

> In general it doesn't make sense to add control files for networking.
> as they easily conflict with legal network device names and thus create
> the possibility of breaking someones userspace.

Yeah, it did not makes sense it the first place to mix devices lists
with global attributes. It's a real mess what people do in sysfs.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-31  5:51     ` Kay Sievers
@ 2010-03-31  6:25       ` Tejun Heo
  2010-03-31  6:52       ` Eric W. Biederman
  2010-04-03  0:58       ` Ben Hutchings
  2 siblings, 0 replies; 83+ messages in thread
From: Tejun Heo @ 2010-03-31  6:25 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Greg KH, linux-kernel,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	Serge Hallyn, netdev

On 03/31/2010 02:51 PM, Kay Sievers wrote:
> Another problem to solve is that sysfs does not allow us to symlink
> regular files, only directories, so we can currently not create the
> class-wide attributes as symlinks to the proper file in
> /sys/subsystem/.

Making sysfs allow symlinks to attributes shouldn't be too hard if
it's gonna help make things more logical overall.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-30 18:31 ` [PATCH 3/6] sysfs: Implement sysfs tagged directory support Eric W. Biederman
  2010-03-31  2:43   ` Serge E. Hallyn
@ 2010-03-31  6:49   ` Tejun Heo
  2010-03-31  7:43     ` Eric W. Biederman
  2010-04-29 20:29   ` patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree gregkh
  2 siblings, 1 reply; 83+ messages in thread
From: Tejun Heo @ 2010-03-31  6:49 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery

Hello, Eric.

On 03/31/2010 03:31 AM, Eric W. Biederman wrote:
> What this patch does is to add an additional tag field to the
> sysfs dirent structure.  For directories that should show different
> contents depending on the context such as /sys/class/net/, and
> /sys/devices/virtual/net/ this tag field is used to specify the
> context in which those directories should be visible.  Effectively
> this is the same as creating multiple distinct directories with
> the same name but internally to sysfs the result is nicer.

This has become a long running project. :-)

The way to implement partial visibility seems much cleaner now and I
don't have any objection.  Thanks for cleaning up the whole sysfs and
implementing this properly.  Unfortunately, I still feel quite
uncomfortable about how the scope of visibility is determined and how
deep knowledge about specific namespace implementation seeps down to
kobject / sysfs layer.  It almost looks like a gross layering
violation.

Is it at all possible to implement it in properly layered manner?
ie. sysfs providing mechanisms for selective visibility, driver model
wraps it and exports it and namespace implements namespaces on top of
those mechanisms?  I can see that there should be some interaction
between the driver model and namespaces but I can't see why that
information should be visible deep down in kobject and sysfs.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-31  5:51     ` Kay Sievers
  2010-03-31  6:25       ` Tejun Heo
@ 2010-03-31  6:52       ` Eric W. Biederman
  2010-04-03  0:58       ` Ben Hutchings
  2 siblings, 0 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-31  6:52 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Greg Kroah-Hartman, Greg KH, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	Serge Hallyn, netdev

Kay Sievers <kay.sievers@vrfy.org> writes:

> On Wed, Mar 31, 2010 at 01:04, Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Kay Sievers <kay.sievers@vrfy.org> writes:
>
> Yeah, /sys/bus/, which is the only sane layout of the needlessly
> different 3 versions of the same thing (bus, class, block).
>
> /sys/bus/<subsys> can just be a plain symlinks to the
> /sys/subsystem/<subsys> directories.
>
> /sys/class/<subsys> *could* be a symlink to the
> /sys/subsystem/<subsys>/devices/ directory, but we really don't want
> to continue to stupidly mix subsystem-wide control files with device
> lists anymore. The "devices" directory needs to be a strict list of
> devices, not some collection of random stuff, that it is today. :)
>
> So we either leave all the conceptually broken class attributes behind
> us, and put them at the  /sys/subsystem/<subsys>/ level only, or we
> need to create the /sys/class/<subsys>/* stuff all as symlinks like we
> do today. I expect, we have to create /sys/class as we do today.

Ideally we will keep new subsystem attributes from creeping into 
/sys/class/xxxx/ directories.

> Another problem to solve is that sysfs does not allow us to symlink
> regular files, only directories, so we can currently not create the
> class-wide attributes as symlinks to the proper file in
> /sys/subsystem/.

That seems to be part of the everything is a kobject interface, and
all kobjects are directories.  I don't think supporting the symlinks
will be particularly hard, although there are issues to consider with
respect to making the symlinks come and go when the attributes do.

>>
>> I'm not entirely clear on what you are doing but it all sounds like it
>> will fit within what I am doing.
>
> The goal is to unify the 3 needlessly different versions of "device
> lists of the same subsystem". We have /sys/class, /sys/bus,
> /sys/block, and all of them will be unified at /sys/subsystem/ leaving
> the old names as compat links only. Unlike block and class, the
> /sys/subsystem/<subsys> directory can be extended with custom
> subdirectories and files, without mixing random files into device
> lists.

That makes sense.  I took a quick look and /sys/block is already
a compatibility define.   So I don't expect any issues there.

At a practical level there don't appear to be too many class attributes
that will cause problems, but even a couple are enough to be a pain.

> Yeah, it did not makes sense it the first place to mix devices lists
> with global attributes. It's a real mess what people do in sysfs.

I was very disappointed in sysfs the first time I saw someone add writable
attributes.  But sysfs is here now.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  6:49   ` Tejun Heo
@ 2010-03-31  7:43     ` Eric W. Biederman
  2010-03-31  8:17       ` Tejun Heo
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-31  7:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery

Tejun Heo <tj@kernel.org> writes:

> Hello, Eric.
>
> On 03/31/2010 03:31 AM, Eric W. Biederman wrote:
>> What this patch does is to add an additional tag field to the
>> sysfs dirent structure.  For directories that should show different
>> contents depending on the context such as /sys/class/net/, and
>> /sys/devices/virtual/net/ this tag field is used to specify the
>> context in which those directories should be visible.  Effectively
>> this is the same as creating multiple distinct directories with
>> the same name but internally to sysfs the result is nicer.
>
> This has become a long running project. :-)
>
> The way to implement partial visibility seems much cleaner now and I
> don't have any objection.  Thanks for cleaning up the whole sysfs and
> implementing this properly.  Unfortunately, I still feel quite
> uncomfortable about how the scope of visibility is determined and how
> deep knowledge about specific namespace implementation seeps down to
> kobject / sysfs layer.  It almost looks like a gross layering
> violation.

The problem is how sysfs and the kobject layer expose things to
userspace.  Maintaining backwards compatibility to userspace in sysfs
while making changes in the rest of the kernel is hard.

Having been through the rest of the kernel and changed every other
significant interface I think I can say that without bias.  

> Is it at all possible to implement it in properly layered manner?
> ie. sysfs providing mechanisms for selective visibility, driver model
> wraps it and exports it and namespace implements namespaces on top of
> those mechanisms?

I think that is roughly what I have.

> I can see that there should be some interaction
> between the driver model and namespaces but I can't see why that
> information should be visible deep down in kobject and sysfs.

At this point you seem to be asking for perfection instead of something
that is merely good enough.  I am open to suggestions for something
better but overall the code is as good as I can get it.  The code
does not impose any real maintenance problems.  The code does not
impose any real performance problems.  The code works.  It is time to
use this stuff, and stop keeping devices out of the kobject layer
because sysfs and the kobject layer can not cope with the reality
of the rest of the kernel.

As for the layering itself.  Down in sysfs there are only two bits
visible.  A void * pointer that in addition to the name is what we use
to define selective visibility.  A context that we capture when we
mount sysfs.  Those bits are fundamental things sysfs needs to do.

Capturing the namespaces of interest when mounting sysfs allows us to
display with different mounts of sysfs what a single mount of sysfs
can not show.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  7:43     ` Eric W. Biederman
@ 2010-03-31  8:17       ` Tejun Heo
  2010-03-31  8:22         ` Tejun Heo
  0 siblings, 1 reply; 83+ messages in thread
From: Tejun Heo @ 2010-03-31  8:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery

Hello, Eric.

On 03/31/2010 04:43 PM, Eric W. Biederman wrote:
>> Is it at all possible to implement it in properly layered manner?
>> ie. sysfs providing mechanisms for selective visibility, driver model
>> wraps it and exports it and namespace implements namespaces on top of
>> those mechanisms?
> 
> I think that is roughly what I have.

Yeah, well, in a sense.  It's all a matter of degree.

> As for the layering itself.  Down in sysfs there are only two bits
> visible.  A void * pointer that in addition to the name is what we use
> to define selective visibility.  A context that we capture when we
> mount sysfs.  Those bits are fundamental things sysfs needs to do.

Well, I guess all I wanna say is... is there *ANY* way to do it w/
less callbacks?  It's very difficult to follow what's going on for
what.

If you think all those callbacks are absolute necessities, can you
please at least add boatload of comments around them explaning what
they're meant to do and how they're gonna be used?  It's probably
because I don't have any experience with namespaces but I really can't
wrap my head around it as it currently stands.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  8:17       ` Tejun Heo
@ 2010-03-31  8:22         ` Tejun Heo
  2010-03-31  9:39           ` Eric W. Biederman
  0 siblings, 1 reply; 83+ messages in thread
From: Tejun Heo @ 2010-03-31  8:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery

Just wanna add a bit more.

On 03/31/2010 05:17 PM, Tejun Heo wrote:
> If you think all those callbacks are absolute necessities, can you
> please at least add boatload of comments around them explaning what
> they're meant to do and how they're gonna be used?  It's probably
> because I don't have any experience with namespaces but I really can't
> wrap my head around it as it currently stands.

The reason why I talked about proper layering is the same reason.
It's very difficult to review your code because I have no idea how
those callbacks are meant to be used and gonna behave and that lowers
maintainability significantly in the long run.  If at all possible,
please make it implement a discrete function which is used to
implement something higher up.  If it's already done like that and I'm
just being stupid, please feel free to enlighten me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  8:22         ` Tejun Heo
@ 2010-03-31  9:39           ` Eric W. Biederman
  2010-04-05  8:17             ` Tejun Heo
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-31  9:39 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery

Tejun Heo <htejun@gmail.com> writes:

> Just wanna add a bit more.
>
> On 03/31/2010 05:17 PM, Tejun Heo wrote:
>> If you think all those callbacks are absolute necessities, can you
>> please at least add boatload of comments around them explaning what
>> they're meant to do and how they're gonna be used?  It's probably
>> because I don't have any experience with namespaces but I really can't
>> wrap my head around it as it currently stands.
>
> The reason why I talked about proper layering is the same reason.
> It's very difficult to review your code because I have no idea how
> those callbacks are meant to be used and gonna behave and that lowers
> maintainability significantly in the long run.  If at all possible,
> please make it implement a discrete function which is used to
> implement something higher up.  If it's already done like that and I'm
> just being stupid, please feel free to enlighten me.

Apologies.   There is a fine line between sending enough patches
to give context and completely overwhelming people with patches,
and of course by this time I am so accustomed to this code I am
practically blind to it.

Let me try a happy median between overwhelming and too little
information by giving you some experts, and a bit of overview.

(Ugh after have writing this I certainly will agree that we
 have some many layers in the device model that they become
 obfuscating abstractions).

Looking through my code there are 3 types of callbacks.
- Callbacks to the namespace type of a children.
  .child_ns_type
- Callbacks to find the namespace of a kobject.
  .namespace
- Callbacks on the a namespace type to find the namespace
  of a particular context.
  .current_ns
  .initial_ns  (not used in my patchset)
  .netlink_ns  (not used in my patchset)


In a world of weird explicitness I expect .child_ns_type and
.namespace could be made to go away by pushing through explicit
ns_type, and namespace parameters everywhere. But that seems
like an awful lot of unnecessary code churn and bloat with
the only real advantage being that we have an abstraction
stored explicit at each layer.

I use child_ns_type to see if a directory should be tagged
and to figure out the type of the tags on a sysfs directory.

I use current_ns to capture the namespace (of ns_type) of the
current process when sysfs is mounted so I know what to show
userspace.

I use ktype->namespace to figure out which namespace a given
kobject's name is in.

There are intermediate steps on those methods but that is
just what appears to be the necessary boilerplate to get
from a class down to a kobject.

The nstype callbacks initial_ns and netlink_ns are not used in this
patchset.  Instead they play a role in the filtering of events sent to
userspace.

netlink_ns is used to find the namespace of a netlink socket
to see if it is ok to send an event over a netlink socket.

static int kobj_bcast_filter(struct sock *dest_sk, struct sk_buff *skb, void *data)
{
	struct kobject *kobj = data;
	const struct kobj_ns_type_operations *ops;

	ops = kobj_ns_ops(kobj);
	if (ops) {
		const void *sock_ns, *ns;
		ns = kobj->ktype->namespace(kobj);
		sock_ns = ops->netlink_ns(dsk);
		return sock_ns != ns;
	}

	return 0;
}

initial_ns is used to figure out what the initial/default
namespace is for a class of namespaces.  We only report
with /sbin/hotplug events in the initial network namespace.
At least for now.

static int kobj_usermode_filter(struct kobject *kobj)
{
	const struct kobj_ns_type_operations *ops;

	ops = kobj_ns_ops(kobj);
	if (ops) {
		const void *init_ns, *ns;
		ns = kobj->ktype->namespace(kobj);
		init_ns = ops->initial_ns();
		return ns != init_ns;
	}

	return 0;
}

This is my change that adds support for the network namespace.
The only namespace I expect to add support for in the short term.

I hope this helps,

Eric


commit fdc0adeaa8bfab9a179e1eb349cab400ddb70403
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Thu Jul 3 16:13:11 2008 -0600

    netns: Teach network device kobjects which namespace they are in.

    The problem.  Network devices show up in sysfs and with the network
    namespace active multiple devices with the same name can show up in
    the same directory, ouch!

    To avoid that problem and allow existing applications in network namespaces
    to see the same interface that is currently presented in sysfs, this
    patch enables the tagging directory support in sysfs.

    By using the network namespace pointers as tags to separate out the
    the sysfs directory entries we ensure that we don't have conflicts
    in the directories and applications only see a limited set of
    the network devices.

    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index d9456f6..9452e39 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -138,6 +138,7 @@ extern const struct sysfs_ops kobj_sysfs_ops;

 enum kobj_ns_type {
 	KOBJ_NS_TYPE_NONE = 0,
+	KOBJ_NS_TYPE_NET,
 	KOBJ_NS_TYPES
 };

diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..265e33b 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES

 menu "Networking options"

+config NET_NS
+	bool "Network namespace support"
+	default n
+	depends on EXPERIMENTAL && NAMESPACES
+	help
+	  Allow user space to create what appear to be multiple instances
+	  of the network stack.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 099c753..1b98e36 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -13,7 +13,9 @@
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
+#include <linux/nsproxy.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <net/wext.h>
@@ -466,6 +468,37 @@ static struct attribute_group wireless_group = {
 };
 #endif

+static const void *net_current_ns(void)
+{
+	return current->nsproxy->net_ns;
+}
+
+static const void *net_initial_ns(void)
+{
+	return &init_net;
+}
+
+static const void *net_netlink_ns(struct sock *sk)
+{
+	return sock_net(sk);
+}
+
+static struct kobj_ns_type_operations net_ns_type_operations = {
+	.type = KOBJ_NS_TYPE_NET,
+	.current_ns = net_current_ns,
+	.netlink_ns = net_netlink_ns,
+	.initial_ns = net_initial_ns,
+};
+
+static void net_kobj_ns_exit(struct net *net)
+{
+	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
+}
+
+static struct pernet_operations sysfs_net_ops = {
+	.exit = net_kobj_ns_exit,
+};
+
 #endif /* CONFIG_SYSFS */

 #ifdef CONFIG_HOTPLUG
@@ -506,6 +539,13 @@ static void netdev_release(struct device *d)
 	kfree((char *)dev - dev->padded);
 }

+static const void *net_namespace(struct device *d)
+{
+	struct net_device *dev;
+	dev = container_of(d, struct net_device, dev);
+	return dev_net(dev);
+}
+
 static struct class net_class = {
 	.name = "net",
 	.dev_release = netdev_release,
@@ -515,6 +555,8 @@ static struct class net_class = {
 #ifdef CONFIG_HOTPLUG
 	.dev_uevent = netdev_uevent,
 #endif
+	.ns_type = &net_ns_type_operations,
+	.namespace = net_namespace,
 };

 /* Delete sysfs entries but hold kobject reference until after all
@@ -587,5 +629,9 @@ void netdev_initialize_kobject(struct net_device *net)

 int netdev_kobject_init(void)
 {
+	kobj_ns_type_register(&net_ns_type_operations);
+#ifdef CONFIG_SYSFS
+	register_pernet_subsys(&sysfs_net_ops);
+#endif
 	return class_register(&net_class);
 }

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-31  5:51     ` Eric W. Biederman
@ 2010-03-31 13:47       ` Serge E. Hallyn
  2010-03-31 14:02         ` Eric W. Biederman
  2010-04-05  7:45       ` Tejun Heo
  1 sibling, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-03-31 13:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Tejun Heo, Greg Kroah-Hartman, Kay Sievers, linux-kernel,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Tejun Heo <tj@kernel.org> writes:
> >> index 30f5a44..030a39d 100644
> >> --- a/fs/sysfs/sysfs.h
> >> +++ b/fs/sysfs/sysfs.h
> >> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
> >>  /*
> >>   * mount.c
> >>   */
> >> +struct sysfs_super_info {
> >> +};
> >> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
> >
> > Another nit picking.  It would be better to wrap SB in the macro
> > definition.  Also, wouldn't an inline function be better?
> 
> Good spotting.  That doesn't bite today but it will certainly bite
> someday if it isn't fixed.
> 
> I wonder how that has slipped through the review all of this time.

(let me demonstrate how: )

WTH are you talking about?  Unless you mean doing (SB) inside
the definition?

I actually was going to suggest dropping the #define as it obscures
the code, but I figured it would get more complicated later.

-serge

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-31 13:47       ` Serge E. Hallyn
@ 2010-03-31 14:02         ` Eric W. Biederman
  0 siblings, 0 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-31 14:02 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Tejun Heo, Greg Kroah-Hartman, Kay Sievers, linux-kernel,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	netdev

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> Tejun Heo <tj@kernel.org> writes:
>> >> index 30f5a44..030a39d 100644
>> >> --- a/fs/sysfs/sysfs.h
>> >> +++ b/fs/sysfs/sysfs.h
>> >> @@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
>> >>  /*
>> >>   * mount.c
>> >>   */
>> >> +struct sysfs_super_info {
>> >> +};
>> >> +#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
>> >
>> > Another nit picking.  It would be better to wrap SB in the macro
>> > definition.  Also, wouldn't an inline function be better?
>> 
>> Good spotting.  That doesn't bite today but it will certainly bite
>> someday if it isn't fixed.
>> 
>> I wonder how that has slipped through the review all of this time.
>
> (let me demonstrate how: )
>
> WTH are you talking about?  Unless you mean doing (SB) inside
> the definition?
>
> I actually was going to suggest dropping the #define as it obscures
> the code, but I figured it would get more complicated later.

I believe the discuss change was to make the define:
#define sysfs_info(SB) ((struct sysfs_super_info *)((SB)->s_fs_info))

As for dropping the define and using s_fs_info raw.  I rather like
a light weight type safe wrapper.  Maybe I just think s_fs_info
is an ugly name.

In practice I never call sysfs_info() with any expression that has
a side effect, so it doesn't matter.

Eric




^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (6 preceding siblings ...)
  2010-03-30 18:53 ` [PATCH 0/6] tagged sysfs support Kay Sievers
@ 2010-03-31 17:21 ` Serge E. Hallyn
  2010-03-31 18:09   ` Eric W. Biederman
  2010-05-05  0:35 ` [PATCH 0/6] netns support in the kobject layer Eric W. Biederman
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-03-31 17:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-kernel,
	Tejun Heo, Cornelia Huck, linux-fsdevel, Eric Dumazet,
	Benjamin LaHaise, netdev

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> The main short coming of using multiple network namespaces today
> is that only network devices for the primary network namespaces
> can be put in the kobject layer and sysfs.
> 
> This is essentially the earlier version of this patchset that was
> reviewed before, just now on top of a version of sysfs that doesn't
> need cleanup patches to support it.
> 
> I have been running these patches in some form for well over a
> year so the basics should at least be solid.  
> 
> This patchset is currently against 2.6.34-rc1.
> 
> This patchset is just the basic infrastructure a couple of more pretty
> trivial patches are needed to actually enable network namespaces to use this.
> My current plan is to send those after these patches have made it through
> review.

Thanks very much for keeping this going, Eric!  I'm going to keep
looking through the code some more, but so far I see no problems.

Acked-by: Serge Hallyn <serue@us.ibm.com>

to the full patchset.  I'm really hoping you'll also include the
patch to implement the netns support (i.e. basically commit
fdc0adeaa8bfab9a179e1eb349cab400ddb70403 that you sent inline this
morning to Tejun).

thanks,
-serge

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-31 17:21 ` Serge E. Hallyn
@ 2010-03-31 18:09   ` Eric W. Biederman
  0 siblings, 0 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-03-31 18:09 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Greg Kroah-Hartman, Kay Sievers, Greg KH, linux-kernel,
	Tejun Heo, Cornelia Huck, linux-fsdevel, Eric Dumazet,
	Benjamin LaHaise, netdev

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> 
>> The main short coming of using multiple network namespaces today
>> is that only network devices for the primary network namespaces
>> can be put in the kobject layer and sysfs.
>> 
>> This is essentially the earlier version of this patchset that was
>> reviewed before, just now on top of a version of sysfs that doesn't
>> need cleanup patches to support it.
>> 
>> I have been running these patches in some form for well over a
>> year so the basics should at least be solid.  
>> 
>> This patchset is currently against 2.6.34-rc1.
>> 
>> This patchset is just the basic infrastructure a couple of more pretty
>> trivial patches are needed to actually enable network namespaces to use this.
>> My current plan is to send those after these patches have made it through
>> review.
>
> Thanks very much for keeping this going, Eric!  I'm going to keep
> looking through the code some more, but so far I see no problems.
>
> Acked-by: Serge Hallyn <serue@us.ibm.com>
>
> to the full patchset.  I'm really hoping you'll also include the
> patch to implement the netns support (i.e. basically commit
> fdc0adeaa8bfab9a179e1eb349cab400ddb70403 that you sent inline this
> morning to Tejun).

One step at a time.  My goal is to have it all send out and reviewed
in time for 2.6.35.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-31  5:51     ` Kay Sievers
  2010-03-31  6:25       ` Tejun Heo
  2010-03-31  6:52       ` Eric W. Biederman
@ 2010-04-03  0:58       ` Ben Hutchings
  2010-04-03  8:35         ` Kay Sievers
  2 siblings, 1 reply; 83+ messages in thread
From: Ben Hutchings @ 2010-04-03  0:58 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Greg KH, linux-kernel,
	Tejun Heo, Cornelia Huck, linux-fsdevel, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev

[-- Attachment #1: Type: text/plain, Size: 2414 bytes --]

On Wed, 2010-03-31 at 07:51 +0200, Kay Sievers wrote:
> On Wed, Mar 31, 2010 at 01:04, Eric W. Biederman <ebiederm@xmission.com> wrote:
> > Kay Sievers <kay.sievers@vrfy.org> writes:
> >> On Tue, Mar 30, 2010 at 20:30, Eric W. Biederman <ebiederm@xmission.com> wrote:
> >>>
> >>> The main short coming of using multiple network namespaces today
> >>> is that only network devices for the primary network namespaces
> >>> can be put in the kobject layer and sysfs.
> >>>
> >>> This is essentially the earlier version of this patchset that was
> >>> reviewed before, just now on top of a version of sysfs that doesn't
> >>> need cleanup patches to support it.
> >>
> >> Just to check if we are not in conflict with planned changes, and how
> >> to possibly handle them:
> >>
> >> There is the plan and ongoing work to unify classes and buses, export
> >> them at /sys/subsystem in the same layout of the current /sys/bus/.
> >> The decision to export buses and classes as two different things
> >> (which they aren't) is the last major piece in the sysfs layout which
> >> needs to be fixed.
> >
> > Interesting.  We will symlinks ie:
> > /sys/class -> /sys/subsystem
> > /sys/bus -> /sys/subsystem
> > to keep from breaking userspace.
> 
> Yeah, /sys/bus/, which is the only sane layout of the needlessly
> different 3 versions of the same thing (bus, class, block).
[...]

block vs class/block is arguable, but as for abstracting the difference
between bus and class... why?

Each bus defines a device interface covering enumeration,
identification, power management and various aspects of their connection
to the host.  This interface is implemented by the bus driver.

Each class defines a device interface covering functionality provided to
user-space or higher level kernel components (block interface to
filesystems, net driver interface to the networking core, etc).  This
interface is implemented by multiple device-specific drivers.

So while buses and classes both define device interfaces, they are
fundamentally different types of interface.  And there are 'subsystems'
that don't have devices at all (time, RCU, perf, ...).  If you're going
to expose the set of subsystems, don't they belong in there?  But then,
what would you put in their directories?

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-04-03  0:58       ` Ben Hutchings
@ 2010-04-03  8:35         ` Kay Sievers
  2010-04-03 16:05           ` Ben Hutchings
  0 siblings, 1 reply; 83+ messages in thread
From: Kay Sievers @ 2010-04-03  8:35 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Greg KH, linux-kernel,
	Tejun Heo, Cornelia Huck, linux-fsdevel, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev

On Sat, Apr 3, 2010 at 02:58, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Wed, 2010-03-31 at 07:51 +0200, Kay Sievers wrote:
>> Yeah, /sys/bus/, which is the only sane layout of the needlessly
>> different 3 versions of the same thing (bus, class, block).
> [...]
>
> block vs class/block is arguable,

That's already done long ago.

> but as for abstracting the difference
> between bus and class... why?

There is absolutely no need to needlessly export two versions of the
same thing. These directories serve no other purpose than to collect
all devices of the same subsystem. There is no useful information that
belongs to the type class or bus, they are both the same. Like
"inputX" is implemented as a class, but is much more like a bus. And
"usb" are devices, which are more a class of devices, and the
interfaces and contollers belong to a bus.

There is really no point to make userspace needlessly complicated to
distinguish the both.

We also have already a buch of subsystems which moved from class to
bus because they needed to express hierarchy between the same devices.
So the goal is to have only one type of subsystem to solve these
problems.

> Each bus defines a device interface covering enumeration,
> identification, power management and various aspects of their connection
> to the host.  This interface is implemented by the bus driver.

Sure, but that does not mean that class is a useful layout, or that
class devices can not do the same.

> Each class defines a device interface covering functionality provided to
> user-space or higher level kernel components (block interface to
> filesystems, net driver interface to the networking core, etc).  This
> interface is implemented by multiple device-specific drivers.

That's absolutely wrong. Classes are just too simple uses of the same
thing. We have many class devices which are not "interfaces", and we
have bus devices which are interfaces.

> So while buses and classes both define device interfaces, they are
> fundamentally different types of interface.

No, they are not. They are just "devices". There is no useful
difference these two different types expose. And the class layout is
fundamentally broken, and not extendable. Peole mix lists of devices
with custom subsystem-wide attributes, which we need to stop from
doing this. The bus layout can carry custom directories, which is why
we want that by default for all "classifications".

> And there are 'subsystems'
> that don't have devices at all (time, RCU, perf, ...).  If you're going
> to expose the set of subsystems, don't they belong in there?
> But then,

We are talking about the current users in /sys, and the difference in
the sysfs export between /sys/bus and /sys/class.

> what would you put in their directories?

We are not talking about anything not in /sys currently.

Kay

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-04-03  8:35         ` Kay Sievers
@ 2010-04-03 16:05           ` Ben Hutchings
  2010-04-03 16:35               ` Kay Sievers
  0 siblings, 1 reply; 83+ messages in thread
From: Ben Hutchings @ 2010-04-03 16:05 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Greg KH, linux-kernel,
	Tejun Heo, Cornelia Huck, linux-fsdevel, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev

[-- Attachment #1: Type: text/plain, Size: 2247 bytes --]

On Sat, 2010-04-03 at 10:35 +0200, Kay Sievers wrote:
> On Sat, Apr 3, 2010 at 02:58, Ben Hutchings <ben@decadent.org.uk> wrote:
> > On Wed, 2010-03-31 at 07:51 +0200, Kay Sievers wrote:
> >> Yeah, /sys/bus/, which is the only sane layout of the needlessly
> >> different 3 versions of the same thing (bus, class, block).
> > [...]
> >
> > block vs class/block is arguable,
> 
> That's already done long ago.
> 
> > but as for abstracting the difference
> > between bus and class... why?
> 
> There is absolutely no need to needlessly export two versions of the
> same thing. These directories serve no other purpose than to collect
> all devices of the same subsystem. There is no useful information that
> belongs to the type class or bus, they are both the same. Like
> "inputX" is implemented as a class, but is much more like a bus.

Really, how do you enumerate 'input' buses?

> And "usb" are devices, which are more a class of devices, and the
> interfaces and contollers belong to a bus.

What common higher-level functionality do USB devices provide?

> There is really no point to make userspace needlessly complicated to
> distinguish the both.
> 
> We also have already a buch of subsystems which moved from class to
> bus because they needed to express hierarchy between the same devices.
> So the goal is to have only one type of subsystem to solve these
> problems.

That's interesting.  Which were those?

[...]
> > So while buses and classes both define device interfaces, they are
> > fundamentally different types of interface.
> 
> No, they are not. They are just "devices". There is no useful
> difference these two different types expose. And the class layout is
> fundamentally broken, and not extendable. Peole mix lists of devices
> with custom subsystem-wide attributes, which we need to stop from
> doing this. The bus layout can carry custom directories, which is why
> we want that by default for all "classifications".
[...]

I understand that you want to clean up a mess, but how do you know
you're not going to break user-space that depends on some of this mess?

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-04-03 16:05           ` Ben Hutchings
@ 2010-04-03 16:35               ` Kay Sievers
  0 siblings, 0 replies; 83+ messages in thread
From: Kay Sievers @ 2010-04-03 16:35 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Greg KH, linux-kernel,
	Tejun Heo, Cornelia Huck, linux-fsdevel, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev

On Sat, Apr 3, 2010 at 18:05, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Sat, 2010-04-03 at 10:35 +0200, Kay Sievers wrote:
>> On Sat, Apr 3, 2010 at 02:58, Ben Hutchings <ben@decadent.org.uk> wrote:
>> > On Wed, 2010-03-31 at 07:51 +0200, Kay Sievers wrote:
>> >> Yeah, /sys/bus/, which is the only sane layout of the needlessly
>> >> different 3 versions of the same thing (bus, class, block).
>> > [...]
>> >
>> > block vs class/block is arguable,
>>
>> That's already done long ago.
>>
>> > but as for abstracting the difference
>> > between bus and class... why?
>>
>> There is absolutely no need to needlessly export two versions of the
>> same thing. These directories serve no other purpose than to collect
>> all devices of the same subsystem. There is no useful information that
>> belongs to the type class or bus, they are both the same. Like
>> "inputX" is implemented as a class, but is much more like a bus.
>
> Really, how do you enumerate 'input' buses?

The current inputX devices, unlike eventX and mouseX, are like "bus devices".

>> And "usb" are devices, which are more a class of devices, and the
>> interfaces and contollers belong to a bus.
>
> What common higher-level functionality do USB devices provide?

A device file per example, which can do anything to the device. :)

>> There is really no point to make userspace needlessly complicated to
>> distinguish the both.
>>
>> We also have already a buch of subsystems which moved from class to
>> bus because they needed to express hierarchy between the same devices.
>> So the goal is to have only one type of subsystem to solve these
>> problems.
>
> That's interesting.  Which were those?

i2c, iio, and a few which have been out-of-tree and got changed before
the merge, because we knew they would not work as class devices, cause
of the need to have childs, or the need to add additional properties
at the subsystem directory level, just like pci, which has a "slots"
directory at the pci subsystem directory, such stuff is not possible
with the too simple class layout.

> [...]
>> > So while buses and classes both define device interfaces, they are
>> > fundamentally different types of interface.
>>
>> No, they are not. They are just "devices". There is no useful
>> difference these two different types expose. And the class layout is
>> fundamentally broken, and not extendable. Peole mix lists of devices
>> with custom subsystem-wide attributes, which we need to stop from
>> doing this. The bus layout can carry custom directories, which is why
>> we want that by default for all "classifications".
> [...]
>
> I understand that you want to clean up a mess, but how do you know
> you're not going to break user-space that depends on some of this mess?

Just like /sys/block is doing it, /sys/class, /sys/bus will stay as
symlinks, and not go away.

Kay

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
@ 2010-04-03 16:35               ` Kay Sievers
  0 siblings, 0 replies; 83+ messages in thread
From: Kay Sievers @ 2010-04-03 16:35 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Eric W. Biederman, Greg Kroah-Hartman, Greg KH, linux-kernel,
	Tejun Heo, Cornelia Huck, linux-fsdevel, Eric Dumazet,
	Benjamin LaHaise, Serge Hallyn, netdev

On Sat, Apr 3, 2010 at 18:05, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Sat, 2010-04-03 at 10:35 +0200, Kay Sievers wrote:
>> On Sat, Apr 3, 2010 at 02:58, Ben Hutchings <ben@decadent.org.uk> wrote:
>> > On Wed, 2010-03-31 at 07:51 +0200, Kay Sievers wrote:
>> >> Yeah, /sys/bus/, which is the only sane layout of the needlessly
>> >> different 3 versions of the same thing (bus, class, block).
>> > [...]
>> >
>> > block vs class/block is arguable,
>>
>> That's already done long ago.
>>
>> > but as for abstracting the difference
>> > between bus and class... why?
>>
>> There is absolutely no need to needlessly export two versions of the
>> same thing. These directories serve no other purpose than to collect
>> all devices of the same subsystem. There is no useful information that
>> belongs to the type class or bus, they are both the same. Like
>> "inputX" is implemented as a class, but is much more like a bus.
>
> Really, how do you enumerate 'input' buses?

The current inputX devices, unlike eventX and mouseX, are like "bus devices".

>> And "usb" are devices, which are more a class of devices, and the
>> interfaces and contollers belong to a bus.
>
> What common higher-level functionality do USB devices provide?

A device file per example, which can do anything to the device. :)

>> There is really no point to make userspace needlessly complicated to
>> distinguish the both.
>>
>> We also have already a buch of subsystems which moved from class to
>> bus because they needed to express hierarchy between the same devices.
>> So the goal is to have only one type of subsystem to solve these
>> problems.
>
> That's interesting.  Which were those?

i2c, iio, and a few which have been out-of-tree and got changed before
the merge, because we knew they would not work as class devices, cause
of the need to have childs, or the need to add additional properties
at the subsystem directory level, just like pci, which has a "slots"
directory at the pci subsystem directory, such stuff is not possible
with the too simple class layout.

> [...]
>> > So while buses and classes both define device interfaces, they are
>> > fundamentally different types of interface.
>>
>> No, they are not. They are just "devices". There is no useful
>> difference these two different types expose. And the class layout is
>> fundamentally broken, and not extendable. Peole mix lists of devices
>> with custom subsystem-wide attributes, which we need to stop from
>> doing this. The bus layout can carry custom directories, which is why
>> we want that by default for all "classifications".
> [...]
>
> I understand that you want to clean up a mess, but how do you know
> you're not going to break user-space that depends on some of this mess?

Just like /sys/block is doing it, /sys/class, /sys/bus will stay as
symlinks, and not go away.

Kay
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 1/6] sysfs: Basic support for multiple super blocks
  2010-03-31  5:51     ` Eric W. Biederman
  2010-03-31 13:47       ` Serge E. Hallyn
@ 2010-04-05  7:45       ` Tejun Heo
  1 sibling, 0 replies; 83+ messages in thread
From: Tejun Heo @ 2010-04-05  7:45 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev

Hello, Eric.

On 03/31/2010 02:51 PM, Eric W. Biederman wrote:
>> I haven't looked at later patches but I suppose this is gonna be
>> filled with more meaningful stuff later. 
> 
> Yes it will.
> 
>> One (possibly silly) thing
>> that stands out compared to get_sb_single() is missing remount
>> handling.  Is it intended?
> 
> There is nothing for a remount to do so I ignore it.   The only
> thing that would possibly be meaningful is a read-only mount,
> and nothing I know of sysfs suggests read-only mounts of sysfs
> work, or make any sense.

I see.  Wouldn't it be better to make that design choice evident by
stating the choice in the comment or at least in the patch
description?  As it currently stands, you're burying a clear
functional change in a seemingly innocent patch which contains zero
line of comment and two lines of description.  The same pattern holds
for this whole patchset.  Where are the comments and descriptions
about the design and implementation?  :-(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
  2010-03-31  9:39           ` Eric W. Biederman
@ 2010-04-05  8:17             ` Tejun Heo
  0 siblings, 0 replies; 83+ messages in thread
From: Tejun Heo @ 2010-04-05  8:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Cornelia Huck,
	linux-fsdevel, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, Benjamin Thery

Hello, Eric.

On 03/31/2010 06:39 PM, Eric W. Biederman wrote:
> Let me try a happy median between overwhelming and too little
> information by giving you some experts, and a bit of overview.
> 
> (Ugh after have writing this I certainly will agree that we
>  have some many layers in the device model that they become
>  obfuscating abstractions).

Yeah, exactly, and this patchset is pushing it further with no
documentation and indirections to high heavens.  As someone who
doesn't have much experience with namespaces, I can't make much sense
of this patchset and it obfuscates the whole kobject thing more and
that's a bad direction to be heading toward.

> Looking through my code there are 3 types of callbacks.
> - Callbacks to the namespace type of a children.
>   .child_ns_type

Can you please also explain the relationships among kobjects, ns_types
and NSes?

> - Callbacks to find the namespace of a kobject.
>   .namespace
> - Callbacks on the a namespace type to find the namespace
>   of a particular context.
>   .current_ns
>   .initial_ns  (not used in my patchset)
>   .netlink_ns  (not used in my patchset)
> 
> In a world of weird explicitness I expect .child_ns_type and
> .namespace could be made to go away by pushing through explicit
> ns_type, and namespace parameters everywhere. But that seems
> like an awful lot of unnecessary code churn and bloat with
> the only real advantage being that we have an abstraction
> stored explicit at each layer.

* How much churn would it be?  I would be willing to trade quite a bit
  if the following can go away.  The sheer amount of indirection there
  scares me a lot.

  struct kobj_type {
  ...
	const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
  ...
  };

* Is it necessary to teach kobject layer the concept of namespaces?
  Wouldn't it be possible to let kobject and sysfs deal with tags and
  make namespaces use them?

> static int kobj_bcast_filter(struct sock *dest_sk, struct sk_buff *skb, void *data)
> {
> 	struct kobject *kobj = data;
> 	const struct kobj_ns_type_operations *ops;
> 
> 	ops = kobj_ns_ops(kobj);
> 	if (ops) {
> 		const void *sock_ns, *ns;
> 		ns = kobj->ktype->namespace(kobj);
> 		sock_ns = ops->netlink_ns(dsk);
> 		return sock_ns != ns;
> 	}
> 
> 	return 0;
> }
> 
> initial_ns is used to figure out what the initial/default
> namespace is for a class of namespaces.  We only report
> with /sbin/hotplug events in the initial network namespace.
> At least for now.
> 
> static int kobj_usermode_filter(struct kobject *kobj)
> {
> 	const struct kobj_ns_type_operations *ops;
> 
> 	ops = kobj_ns_ops(kobj);
> 	if (ops) {
> 		const void *init_ns, *ns;
> 		ns = kobj->ktype->namespace(kobj);
> 		init_ns = ops->initial_ns();
> 		return ns != init_ns;
> 	}
> 
> 	return 0;
> }

I can understand you would need two different ways of establishing the
accessor depending on the mode of access (file IO or netlink) but can
initial_ns ever be dynamic?  Can't it just be void *inital_ns instead
of a callback?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch driver-core-implement-ns-directory-support-for-device-classes.patch added to gregkh-2.6 tree
  2010-03-30 18:31 ` [PATCH 6/6] driver core: Implement ns directory support for device classes Eric W. Biederman
@ 2010-04-29 20:29   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-04-29 20:29 UTC (permalink / raw)
  To: ebiederm, bcrl, benjamin.thery, cornelia.huck, eric.dumazet,
	gregkh, kay.sievers, netdev, serue


This is a note to let you know that I've just added the patch titled

    Subject: driver core: Implement ns directory support for device classes.

to my gregkh-2.6 tree.  Its filename is

    driver-core-implement-ns-directory-support-for-device-classes.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu Apr 29 12:47:44 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue, 30 Mar 2010 11:31:29 -0700
Subject: driver core: Implement ns directory support for device classes.
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, linux-fsdevel@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, "Eric W. Biederman" <ebiederm@xmission.com>, Benjamin Thery <benjamin.thery@bull.net>
Message-ID: <1269973889-25260-6-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

device_del and device_rename were modified to use
sysfs_delete_link and sysfs_rename_link respectively to ensure
when these operations happen on devices whose classes
are in namespace directories they work properly.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/base/core.c |   21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -786,7 +786,7 @@ out_device:
 out_busid:
 	if (dev->kobj.parent != &dev->class->p->class_subsys.kobj &&
 	    device_is_not_partition(dev))
-		sysfs_remove_link(&dev->class->p->class_subsys.kobj,
+		sysfs_delete_link(&dev->class->p->class_subsys.kobj, &dev->kobj,
 				  dev_name(dev));
 #else
 	/* link in the class directory pointing to the device */
@@ -804,7 +804,7 @@ out_busid:
 	return 0;
 
 out_busid:
-	sysfs_remove_link(&dev->class->p->class_subsys.kobj, dev_name(dev));
+	sysfs_delete_link(&dev->class->p->class_subsys.kobj, &dev->kobj, dev_name(dev));
 #endif
 
 out_subsys:
@@ -832,13 +832,13 @@ static void device_remove_class_symlinks
 
 	if (dev->kobj.parent != &dev->class->p->class_subsys.kobj &&
 	    device_is_not_partition(dev))
-		sysfs_remove_link(&dev->class->p->class_subsys.kobj,
+		sysfs_delete_link(&dev->class->p->class_subsys.kobj, &dev->kobj,
 				  dev_name(dev));
 #else
 	if (dev->parent && device_is_not_partition(dev))
 		sysfs_remove_link(&dev->kobj, "device");
 
-	sysfs_remove_link(&dev->class->p->class_subsys.kobj, dev_name(dev));
+	sysfs_delete_link(&dev->class->p->class_subsys.kobj, &dev->kobj, dev_name(dev));
 #endif
 
 	sysfs_remove_link(&dev->kobj, "subsystem");
@@ -1624,6 +1624,14 @@ int device_rename(struct device *dev, ch
 		goto out;
 	}
 
+#ifndef CONFIG_SYSFS_DEPRECATED
+	if (dev->class) {
+		error = sysfs_rename_link(&dev->class->p->class_subsys.kobj,
+			&dev->kobj, old_device_name, new_name);
+		if (error)
+			goto out;
+	}
+#endif
 	error = kobject_rename(&dev->kobj, new_name);
 	if (error)
 		goto out;
@@ -1638,11 +1646,6 @@ int device_rename(struct device *dev, ch
 						  new_class_name);
 		}
 	}
-#else
-	if (dev->class) {
-		error = sysfs_rename_link(&dev->class->p->class_subsys.kobj,
-					  &dev->kobj, old_device_name, new_name);
-	}
 #endif
 
 out:


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch kobj-add-basic-infrastructure-for-dealing-with-namespaces.patch added to gregkh-2.6 tree
  2010-03-30 18:31 ` [PATCH 2/6] kobj: Add basic infrastructure for dealing with namespaces Eric W. Biederman
@ 2010-04-29 20:29   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-04-29 20:29 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, eric.dumazet, gregkh, kay.sievers,
	netdev, serue, tj


This is a note to let you know that I've just added the patch titled

    Subject: kobj: Add basic infrastructure for dealing with namespaces.

to my gregkh-2.6 tree.  Its filename is

    kobj-add-basic-infrastructure-for-dealing-with-namespaces.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu Apr 29 12:44:32 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue, 30 Mar 2010 11:31:25 -0700
Subject: kobj: Add basic infrastructure for dealing with namespaces.
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, linux-fsdevel@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, "Eric W. Biederman" <ebiederm@xmission.com>
Message-ID: <1269973889-25260-2-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

Move complete knowledge of namespaces into the kobject layer
so we can use that information when reporting kobjects to
userspace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/base/class.c    |    9 ++++
 drivers/base/core.c     |   77 +++++++++++++++++++++++++++++------
 include/linux/device.h  |    3 +
 include/linux/kobject.h |   26 ++++++++++++
 lib/kobject.c           |  103 ++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 204 insertions(+), 14 deletions(-)

--- a/drivers/base/class.c
+++ b/drivers/base/class.c
@@ -63,6 +63,14 @@ static void class_release(struct kobject
 	kfree(cp);
 }
 
+static const struct kobj_ns_type_operations *class_child_ns_type(struct kobject *kobj)
+{
+	struct class_private *cp = to_class(kobj);
+	struct class *class = cp->class;
+
+	return class->ns_type;
+}
+
 static const struct sysfs_ops class_sysfs_ops = {
 	.show	= class_attr_show,
 	.store	= class_attr_store,
@@ -71,6 +79,7 @@ static const struct sysfs_ops class_sysf
 static struct kobj_type class_ktype = {
 	.sysfs_ops	= &class_sysfs_ops,
 	.release	= class_release,
+	.child_ns_type	= class_child_ns_type,
 };
 
 /* Hotplug events for classes go to the class class_subsys */
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -131,9 +131,21 @@ static void device_release(struct kobjec
 	kfree(p);
 }
 
+static const void *device_namespace(struct kobject *kobj)
+{
+	struct device *dev = to_dev(kobj);
+	const void *ns = NULL;
+
+	if (dev->class && dev->class->ns_type)
+		ns = dev->class->namespace(dev);
+
+	return ns;
+}
+
 static struct kobj_type device_ktype = {
 	.release	= device_release,
 	.sysfs_ops	= &dev_sysfs_ops,
+	.namespace	= device_namespace,
 };
 
 
@@ -595,11 +607,59 @@ static struct kobject *virtual_device_pa
 	return virtual_dir;
 }
 
-static struct kobject *get_device_parent(struct device *dev,
-					 struct device *parent)
+struct class_dir {
+	struct kobject kobj;
+	struct class *class;
+};
+
+#define to_class_dir(obj) container_of(obj, struct class_dir, kobj)
+
+static void class_dir_release(struct kobject *kobj)
+{
+	struct class_dir *dir = to_class_dir(kobj);
+	kfree(dir);
+}
+
+static const
+struct kobj_ns_type_operations *class_dir_child_ns_type(struct kobject *kobj)
+{
+	struct class_dir *dir = to_class_dir(kobj);
+	return dir->class->ns_type;
+}
+
+static struct kobj_type class_dir_ktype = {
+	.release	= class_dir_release,
+	.sysfs_ops	= &kobj_sysfs_ops,
+	.child_ns_type	= class_dir_child_ns_type
+};
+
+static struct kobject *
+class_dir_create_and_add(struct class *class, struct kobject *parent_kobj)
 {
+	struct class_dir *dir;
 	int retval;
 
+	dir = kzalloc(sizeof(*dir), GFP_KERNEL);
+	if (!dir)
+		return NULL;
+
+	dir->class = class;
+	kobject_init(&dir->kobj, &class_dir_ktype);
+
+	dir->kobj.kset = &class->p->class_dirs;
+
+	retval = kobject_add(&dir->kobj, parent_kobj, "%s", class->name);
+	if (retval < 0) {
+		kobject_put(&dir->kobj);
+		return NULL;
+	}
+	return &dir->kobj;
+}
+
+
+static struct kobject *get_device_parent(struct device *dev,
+					 struct device *parent)
+{
 	if (dev->class) {
 		static DEFINE_MUTEX(gdp_mutex);
 		struct kobject *kobj = NULL;
@@ -634,18 +694,7 @@ static struct kobject *get_device_parent
 		}
 
 		/* or create a new class-directory at the parent device */
-		k = kobject_create();
-		if (!k) {
-			mutex_unlock(&gdp_mutex);
-			return NULL;
-		}
-		k->kset = &dev->class->p->class_dirs;
-		retval = kobject_add(k, parent_kobj, "%s", dev->class->name);
-		if (retval < 0) {
-			mutex_unlock(&gdp_mutex);
-			kobject_put(k);
-			return NULL;
-		}
+		k = class_dir_create_and_add(dev->class, parent_kobj);
 		/* do not emit an uevent for this simple "glue" directory */
 		mutex_unlock(&gdp_mutex);
 		return k;
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -202,6 +202,9 @@ struct class {
 	int (*suspend)(struct device *dev, pm_message_t state);
 	int (*resume)(struct device *dev);
 
+	const struct kobj_ns_type_operations *ns_type;
+	const void *(*namespace)(struct device *dev);
+
 	const struct dev_pm_ops *pm;
 
 	struct class_private *p;
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -108,6 +108,8 @@ struct kobj_type {
 	void (*release)(struct kobject *kobj);
 	const struct sysfs_ops *sysfs_ops;
 	struct attribute **default_attrs;
+	const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj);
+	const void *(*namespace)(struct kobject *kobj);
 };
 
 struct kobj_uevent_env {
@@ -134,6 +136,30 @@ struct kobj_attribute {
 
 extern const struct sysfs_ops kobj_sysfs_ops;
 
+enum kobj_ns_type {
+	KOBJ_NS_TYPE_NONE = 0,
+	KOBJ_NS_TYPES
+};
+
+struct sock;
+struct kobj_ns_type_operations {
+	enum kobj_ns_type type;
+	const void *(*current_ns)(void);
+	const void *(*netlink_ns)(struct sock *sk);
+	const void *(*initial_ns)(void);
+};
+
+int kobj_ns_type_register(const struct kobj_ns_type_operations *ops);
+int kobj_ns_type_registered(enum kobj_ns_type type);
+const struct kobj_ns_type_operations *kobj_child_ns_ops(struct kobject *parent);
+const struct kobj_ns_type_operations *kobj_ns_ops(struct kobject *kobj);
+
+const void *kobj_ns_current(enum kobj_ns_type type);
+const void *kobj_ns_netlink(enum kobj_ns_type type, struct sock *sk);
+const void *kobj_ns_initial(enum kobj_ns_type type);
+void kobj_ns_exit(enum kobj_ns_type type, const void *ns);
+
+
 /**
  * struct kset - a set of kobjects of a specific type, belonging to a specific subsystem.
  *
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -850,6 +850,109 @@ struct kset *kset_create_and_add(const c
 }
 EXPORT_SYMBOL_GPL(kset_create_and_add);
 
+
+static DEFINE_SPINLOCK(kobj_ns_type_lock);
+static const struct kobj_ns_type_operations *kobj_ns_ops_tbl[KOBJ_NS_TYPES];
+
+int kobj_ns_type_register(const struct kobj_ns_type_operations *ops)
+{
+	enum kobj_ns_type type = ops->type;
+	int error;
+
+	spin_lock(&kobj_ns_type_lock);
+
+	error = -EINVAL;
+	if (type >= KOBJ_NS_TYPES)
+		goto out;
+
+	error = -EINVAL;
+	if (type <= KOBJ_NS_TYPE_NONE)
+		goto out;
+
+	error = -EBUSY;
+	if (kobj_ns_ops_tbl[type])
+		goto out;
+
+	error = 0;
+	kobj_ns_ops_tbl[type] = ops;
+
+out:
+	spin_unlock(&kobj_ns_type_lock);
+	return error;
+}
+
+int kobj_ns_type_registered(enum kobj_ns_type type)
+{
+	int registered = 0;
+
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES))
+		registered = kobj_ns_ops_tbl[type] != NULL;
+	spin_unlock(&kobj_ns_type_lock);
+
+	return registered;
+}
+
+const struct kobj_ns_type_operations *kobj_child_ns_ops(struct kobject *parent)
+{
+	const struct kobj_ns_type_operations *ops = NULL;
+
+	if (parent && parent->ktype->child_ns_type)
+		ops = parent->ktype->child_ns_type(parent);
+
+	return ops;
+}
+
+const struct kobj_ns_type_operations *kobj_ns_ops(struct kobject *kobj)
+{
+	return kobj_child_ns_ops(kobj->parent);
+}
+
+
+const void *kobj_ns_current(enum kobj_ns_type type)
+{
+	const void *ns = NULL;
+
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) &&
+	    kobj_ns_ops_tbl[type])
+		ns = kobj_ns_ops_tbl[type]->current_ns();
+	spin_unlock(&kobj_ns_type_lock);
+
+	return ns;
+}
+
+const void *kobj_ns_netlink(enum kobj_ns_type type, struct sock *sk)
+{
+	const void *ns = NULL;
+
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) &&
+	    kobj_ns_ops_tbl[type])
+		ns = kobj_ns_ops_tbl[type]->netlink_ns(sk);
+	spin_unlock(&kobj_ns_type_lock);
+
+	return ns;
+}
+
+const void *kobj_ns_initial(enum kobj_ns_type type)
+{
+	const void *ns = NULL;
+
+	spin_lock(&kobj_ns_type_lock);
+	if ((type > KOBJ_NS_TYPE_NONE) && (type < KOBJ_NS_TYPES) &&
+	    kobj_ns_ops_tbl[type])
+		ns = kobj_ns_ops_tbl[type]->initial_ns();
+	spin_unlock(&kobj_ns_type_lock);
+
+	return ns;
+}
+
+void kobj_ns_exit(enum kobj_ns_type type, const void *ns)
+{
+}
+
+
 EXPORT_SYMBOL(kobject_get);
 EXPORT_SYMBOL(kobject_put);
 EXPORT_SYMBOL(kobject_del);


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch sysfs-add-support-for-tagged-directories-with-untagged-members.patch added to gregkh-2.6 tree
  2010-03-30 18:31 ` [PATCH 4/6] sysfs: Add support for tagged directories with untagged members Eric W. Biederman
@ 2010-04-29 20:29   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-04-29 20:29 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, ebiederm, ebiederm, eric.dumazet,
	gregkh, kay.sievers


This is a note to let you know that I've just added the patch titled

    Subject: sysfs: Add support for tagged directories with untagged members.

to my gregkh-2.6 tree.  Its filename is

    sysfs-add-support-for-tagged-directories-with-untagged-members.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu Apr 29 12:47:00 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue, 30 Mar 2010 11:31:27 -0700
Subject: sysfs: Add support for tagged directories with untagged members.
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, linux-fsdevel@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, "Eric W. Biederman" <ebiederm@maxwell.aristanetworks.com>, "Eric W. Biederman" <ebiederm@aristanetworks.com>
Message-ID: <1269973889-25260-4-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@maxwell.aristanetworks.com>

I had hopped to avoid this but the bonding driver adds a file
to /sys/class/net/  and the easiest way to handle that file is
to make it untagged and to register it only once.

So relax the rules on tagged directories, and make bonding work.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/sysfs/dir.c   |   12 +++---------
 fs/sysfs/inode.c |    2 ++
 2 files changed, 5 insertions(+), 9 deletions(-)

--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -383,12 +383,6 @@ int __sysfs_add_one(struct sysfs_addrm_c
 	if (sysfs_find_dirent(acxt->parent_sd, sd->s_ns, sd->s_name))
 		return -EEXIST;
 
-	if (sysfs_ns_type(acxt->parent_sd) && !sd->s_ns) {
-		WARN(1, KERN_WARNING "sysfs: ns required in '%s' for '%s'\n",
-			acxt->parent_sd->s_name, sd->s_name);
-		return -EINVAL;
-	}
-
 	sd->s_parent = sysfs_get(acxt->parent_sd);
 
 	sysfs_link_sibling(sd);
@@ -545,7 +539,7 @@ struct sysfs_dirent *sysfs_find_dirent(s
 	struct sysfs_dirent *sd;
 
 	for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling) {
-		if (sd->s_ns != ns)
+		if (ns && sd->s_ns && (sd->s_ns != ns))
 			continue;
 		if (!strcmp(sd->s_name, name))
 			return sd;
@@ -879,7 +873,7 @@ static struct sysfs_dirent *sysfs_dir_po
 		while (pos && (ino > pos->s_ino))
 			pos = pos->s_sibling;
 	}
-	while (pos && pos->s_ns != ns)
+	while (pos && pos->s_ns && pos->s_ns != ns)
 		pos = pos->s_sibling;
 	return pos;
 }
@@ -890,7 +884,7 @@ static struct sysfs_dirent *sysfs_dir_ne
 	pos = sysfs_dir_pos(ns, parent_sd, ino, pos);
 	if (pos)
 		pos = pos->s_sibling;
-	while (pos && pos->s_ns != ns)
+	while (pos && pos->s_ns && pos->s_ns != ns)
 		pos = pos->s_sibling;
 	return pos;
 }
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -335,6 +335,8 @@ int sysfs_hash_and_remove(struct sysfs_d
 	sysfs_addrm_start(&acxt, dir_sd);
 
 	sd = sysfs_find_dirent(dir_sd, ns, name);
+	if (sd && (sd->s_ns != ns))
+		sd = NULL;
 	if (sd)
 		sysfs_remove_one(&acxt, sd);
 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch sysfs-basic-support-for-multiple-super-blocks.patch added to gregkh-2.6 tree
  2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
                     ` (2 preceding siblings ...)
  2010-03-31  5:41   ` Tejun Heo
@ 2010-04-29 20:29   ` gregkh
  3 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-04-29 20:29 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, eric.dumazet, gregkh, kay.sievers,
	netdev, serue, tj


This is a note to let you know that I've just added the patch titled

    Subject: sysfs: Basic support for multiple super blocks

to my gregkh-2.6 tree.  Its filename is

    sysfs-basic-support-for-multiple-super-blocks.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu Apr 29 12:41:57 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue, 30 Mar 2010 11:31:24 -0700
Subject: sysfs: Basic support for multiple super blocks
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, linux-fsdevel@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, "Eric W. Biederman" <ebiederm@xmission.com>
Message-ID: <1269973889-25260-1-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

Add all of the necessary bioler plate to support
multiple superblocks in sysfs.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/sysfs/mount.c |   58 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/sysfs/sysfs.h |    3 ++
 2 files changed, 59 insertions(+), 2 deletions(-)

--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -72,16 +72,70 @@ static int sysfs_fill_super(struct super
 	return 0;
 }
 
+static int sysfs_test_super(struct super_block *sb, void *data)
+{
+	struct sysfs_super_info *sb_info = sysfs_info(sb);
+	struct sysfs_super_info *info = data;
+	int found = 1;
+	return found;
+}
+
+static int sysfs_set_super(struct super_block *sb, void *data)
+{
+	int error;
+	error = set_anon_super(sb, data);
+	if (!error)
+		sb->s_fs_info = data;
+	return error;
+}
+
 static int sysfs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
-	return get_sb_single(fs_type, flags, data, sysfs_fill_super, mnt);
+	struct sysfs_super_info *info;
+	struct super_block *sb;
+	int error;
+
+	error = -ENOMEM;
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (!info)
+		goto out;
+	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
+	if (IS_ERR(sb) || sb->s_fs_info != info)
+		kfree(info);
+	if (IS_ERR(sb)) {
+		kfree(info);
+		error = PTR_ERR(sb);
+		goto out;
+	}
+	if (!sb->s_root) {
+		sb->s_flags = flags;
+		error = sysfs_fill_super(sb, data, flags & MS_SILENT ? 1 : 0);
+		if (error) {
+			deactivate_locked_super(sb);
+			goto out;
+		}
+		sb->s_flags |= MS_ACTIVE;
+	}
+
+	simple_set_mnt(mnt, sb);
+	error = 0;
+out:
+	return error;
+}
+
+static void sysfs_kill_sb(struct super_block *sb)
+{
+	struct sysfs_super_info *info = sysfs_info(sb);
+
+	kill_anon_super(sb);
+	kfree(info);
 }
 
 static struct file_system_type sysfs_fs_type = {
 	.name		= "sysfs",
 	.get_sb		= sysfs_get_sb,
-	.kill_sb	= kill_anon_super,
+	.kill_sb	= sysfs_kill_sb,
 };
 
 int __init sysfs_init(void)
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -114,6 +114,9 @@ struct sysfs_addrm_cxt {
 /*
  * mount.c
  */
+struct sysfs_super_info {
+};
+#define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
 extern struct sysfs_dirent sysfs_root;
 extern struct kmem_cache *sysfs_dir_cachep;
 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch sysfs-implement-sysfs_delete_link.patch added to gregkh-2.6 tree
  2010-03-30 18:31 ` [PATCH 5/6] sysfs: Implement sysfs_delete_link Eric W. Biederman
@ 2010-04-29 20:29   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-04-29 20:29 UTC (permalink / raw)
  To: ebiederm, bcrl, benjamin.thery, cornelia.huck, dlezcano,
	eric.dumazet, gregkh, kay.sievers, netdev


This is a note to let you know that I've just added the patch titled

    Subject: sysfs: Implement sysfs_delete_link

to my gregkh-2.6 tree.  Its filename is

    sysfs-implement-sysfs_delete_link.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu Apr 29 12:47:27 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue, 30 Mar 2010 11:31:28 -0700
Subject: sysfs: Implement sysfs_delete_link
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, linux-fsdevel@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, "Eric W. Biederman" <ebiederm@xmission.com>, Benjamin Thery <benjamin.thery@bull.net>, Daniel Lezcano <dlezcano@fr.ibm.com>
Message-ID: <1269973889-25260-5-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

When removing a symlink sysfs_remove_link does not provide
enough information to figure out which tagged directory the symlink
falls in.  So I need sysfs_delete_link which is passed the target
of the symlink to delete.

sysfs_rename_link is updated to call sysfs_delete_link instead
of sysfs_remove_link as we have all of the information necessary
and the callers are interesting.

Both of these functions now have enough information to find a symlink
in a tagged directory.  The only restriction is that they must be called
before the target kobject is renamed or deleted.  If they are called
later I loose track of which tag the target kobject was marked with
and can no longer find the old symlink to remove it.

This patch was split from an earlier patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 fs/sysfs/symlink.c    |   20 ++++++++++++++++++++
 include/linux/sysfs.h |    8 ++++++++
 2 files changed, 28 insertions(+)

--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -109,6 +109,26 @@ int sysfs_create_link_nowarn(struct kobj
 }
 
 /**
+ *	sysfs_delete_link - remove symlink in object's directory.
+ *	@kobj:	object we're acting for.
+ *	@targ:	object we're pointing to.
+ *	@name:	name of the symlink to remove.
+ *
+ *	Unlike sysfs_remove_link sysfs_delete_link has enough information
+ *	to successfully delete symlinks in tagged directories.
+ */
+void sysfs_delete_link(struct kobject *kobj, struct kobject *targ,
+			const char *name)
+{
+	const void *ns = NULL;
+	spin_lock(&sysfs_assoc_lock);
+	if (targ->sd)
+		ns = targ->sd->s_ns;
+	spin_unlock(&sysfs_assoc_lock);
+	sysfs_hash_and_remove(kobj->sd, ns, name);
+}
+
+/**
  *	sysfs_remove_link - remove symlink in object's directory.
  *	@kobj:	object we're acting for.
  *	@name:	name of the symlink to remove.
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -155,6 +155,9 @@ void sysfs_remove_link(struct kobject *k
 int sysfs_rename_link(struct kobject *kobj, struct kobject *target,
 			const char *old_name, const char *new_name);
 
+void sysfs_delete_link(struct kobject *dir, struct kobject *targ,
+			const char *name);
+
 int __must_check sysfs_create_group(struct kobject *kobj,
 				    const struct attribute_group *grp);
 int sysfs_update_group(struct kobject *kobj,
@@ -269,6 +272,11 @@ static inline int sysfs_rename_link(stru
 	return 0;
 }
 
+static inline void sysfs_delete_link(struct kobject *k, struct kobject *t,
+				     const char *name)
+{
+}
+
 static inline int sysfs_create_group(struct kobject *kobj,
 				     const struct attribute_group *grp)
 {


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-03-30 18:31 ` [PATCH 3/6] sysfs: Implement sysfs tagged directory support Eric W. Biederman
  2010-03-31  2:43   ` Serge E. Hallyn
  2010-03-31  6:49   ` Tejun Heo
@ 2010-04-29 20:29   ` gregkh
  2010-04-30  4:18     ` Tejun Heo
  2 siblings, 1 reply; 83+ messages in thread
From: gregkh @ 2010-04-29 20:29 UTC (permalink / raw)
  To: ebiederm, bcrl, benjamin.thery, cornelia.huck, eric.dumazet,
	gregkh, kay.sievers, netdev, serue


This is a note to let you know that I've just added the patch titled

    Subject: sysfs: Implement sysfs tagged directory support.

to my gregkh-2.6 tree.  Its filename is

    sysfs-implement-sysfs-tagged-directory-support.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu Apr 29 12:46:06 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue, 30 Mar 2010 11:31:26 -0700
Subject: sysfs: Implement sysfs tagged directory support.
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, linux-fsdevel@vger.kernel.org, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, "Eric W. Biederman" <ebiederm@xmission.com>, Benjamin Thery <benjamin.thery@bull.net>
Message-ID: <1269973889-25260-3-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

The problem.  When implementing a network namespace I need to be able
to have multiple network devices with the same name.  Currently this
is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
potentially a few other directories of the form /sys/ ... /net/*.

What this patch does is to add an additional tag field to the
sysfs dirent structure.  For directories that should show different
contents depending on the context such as /sys/class/net/, and
/sys/devices/virtual/net/ this tag field is used to specify the
context in which those directories should be visible.  Effectively
this is the same as creating multiple distinct directories with
the same name but internally to sysfs the result is nicer.

I am calling the concept of a single directory that looks like multiple
directories all at the same path in the filesystem tagged directories.

For the networking namespace the set of directories whose contents I need
to filter with tags can depend on the presence or absence of hotplug
hardware or which modules are currently loaded.  Which means I need
a simple race free way to setup those directories as tagged.

To achieve a reace free design all tagged directories are created
and managed by sysfs itself.

Users of this interface:
- define a type in the sysfs_tag_type enumeration.
- call sysfs_register_ns_types with the type and it's operations
- sysfs_exit_ns when an individual tag is no longer valid

- Implement mount_ns() which returns the ns of the calling process
  so we can attach it to a sysfs superblock.
- Implement ktype.namespace() which returns the ns of a syfs kobject.

Everything else is left up to sysfs and the driver layer.

For the network namespace mount_ns and namespace() are essentially
one line functions, and look to remain that.

Tags are currently represented a const void * pointers as that is
both generic, prevides enough information for equality comparisons,
and is trivial to create for current users, as it is just the
existing namespace pointer.

The work needed in sysfs is more extensive.  At each directory
or symlink creating I need to check if the directory it is being
created in is a tagged directory and if so generate the appropriate
tag to place on the sysfs_dirent.  Likewise at each symlink or
directory removal I need to check if the sysfs directory it is
being removed from is a tagged directory and if so figure out
which tag goes along with the name I am deleting.

Currently only directories which hold kobjects, and
symlinks are supported.  There is not enough information
in the current file attribute interfaces to give us anything
to discriminate on which makes it useless, and there are
no potential users which makes it an uninteresting problem
to solve.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 drivers/gpio/gpiolib.c |    2 
 drivers/md/bitmap.c    |    4 -
 drivers/md/md.c        |    6 +-
 fs/sysfs/bin.c         |    2 
 fs/sysfs/dir.c         |  112 ++++++++++++++++++++++++++++++++++++++-----------
 fs/sysfs/file.c        |   17 ++++---
 fs/sysfs/group.c       |    6 +-
 fs/sysfs/inode.c       |    4 -
 fs/sysfs/mount.c       |   33 ++++++++++++++
 fs/sysfs/symlink.c     |   15 +++++-
 fs/sysfs/sysfs.h       |   20 +++++++-
 include/linux/sysfs.h  |   10 ++++
 lib/kobject.c          |    1 
 13 files changed, 181 insertions(+), 51 deletions(-)

--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -399,7 +399,7 @@ static int gpio_setup_irq(struct gpio_de
 			goto free_id;
 		}
 
-		pdesc->value_sd = sysfs_get_dirent(dev->kobj.sd, "value");
+		pdesc->value_sd = sysfs_get_dirent(dev->kobj.sd, NULL, "value");
 		if (!pdesc->value_sd) {
 			ret = -ENODEV;
 			goto free_id;
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1678,9 +1678,9 @@ int bitmap_create(mddev_t *mddev)
 
 	bitmap->mddev = mddev;
 
-	bm = sysfs_get_dirent(mddev->kobj.sd, "bitmap");
+	bm = sysfs_get_dirent(mddev->kobj.sd, NULL, "bitmap");
 	if (bm) {
-		bitmap->sysfs_can_clear = sysfs_get_dirent(bm, "can_clear");
+		bitmap->sysfs_can_clear = sysfs_get_dirent(bm, NULL, "can_clear");
 		sysfs_put(bm);
 	} else
 		bitmap->sysfs_can_clear = NULL;
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1766,7 +1766,7 @@ static int bind_rdev_to_array(mdk_rdev_t
 		kobject_del(&rdev->kobj);
 		goto fail;
 	}
-	rdev->sysfs_state = sysfs_get_dirent(rdev->kobj.sd, "state");
+	rdev->sysfs_state = sysfs_get_dirent(rdev->kobj.sd, NULL, "state");
 
 	list_add_rcu(&rdev->same_set, &mddev->disks);
 	bd_claim_by_disk(rdev->bdev, rdev->bdev->bd_holder, mddev->gendisk);
@@ -4183,7 +4183,7 @@ static int md_alloc(dev_t dev, char *nam
 	mutex_unlock(&disks_mutex);
 	if (!error) {
 		kobject_uevent(&mddev->kobj, KOBJ_ADD);
-		mddev->sysfs_state = sysfs_get_dirent(mddev->kobj.sd, "array_state");
+		mddev->sysfs_state = sysfs_get_dirent(mddev->kobj.sd, NULL, "array_state");
 	}
 	mddev_put(mddev);
 	return error;
@@ -4392,7 +4392,7 @@ static int do_md_run(mddev_t * mddev)
 			printk(KERN_WARNING
 			       "md: cannot register extra attributes for %s\n",
 			       mdname(mddev));
-		mddev->sysfs_action = sysfs_get_dirent(mddev->kobj.sd, "sync_action");
+		mddev->sysfs_action = sysfs_get_dirent(mddev->kobj.sd, NULL, "sync_action");
 	} else if (mddev->ro == 2) /* auto-readonly not meaningful */
 		mddev->ro = 0;
 
--- a/fs/sysfs/bin.c
+++ b/fs/sysfs/bin.c
@@ -501,7 +501,7 @@ int sysfs_create_bin_file(struct kobject
 void sysfs_remove_bin_file(struct kobject *kobj,
 			   const struct bin_attribute *attr)
 {
-	sysfs_hash_and_remove(kobj->sd, attr->attr.name);
+	sysfs_hash_and_remove(kobj->sd, NULL, attr->attr.name);
 }
 
 EXPORT_SYMBOL_GPL(sysfs_create_bin_file);
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -380,9 +380,15 @@ int __sysfs_add_one(struct sysfs_addrm_c
 {
 	struct sysfs_inode_attrs *ps_iattr;
 
-	if (sysfs_find_dirent(acxt->parent_sd, sd->s_name))
+	if (sysfs_find_dirent(acxt->parent_sd, sd->s_ns, sd->s_name))
 		return -EEXIST;
 
+	if (sysfs_ns_type(acxt->parent_sd) && !sd->s_ns) {
+		WARN(1, KERN_WARNING "sysfs: ns required in '%s' for '%s'\n",
+			acxt->parent_sd->s_name, sd->s_name);
+		return -EINVAL;
+	}
+
 	sd->s_parent = sysfs_get(acxt->parent_sd);
 
 	sysfs_link_sibling(sd);
@@ -533,13 +539,17 @@ void sysfs_addrm_finish(struct sysfs_add
  *	Pointer to sysfs_dirent if found, NULL if not.
  */
 struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
+				       const void *ns,
 				       const unsigned char *name)
 {
 	struct sysfs_dirent *sd;
 
-	for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling)
+	for (sd = parent_sd->s_dir.children; sd; sd = sd->s_sibling) {
+		if (sd->s_ns != ns)
+			continue;
 		if (!strcmp(sd->s_name, name))
 			return sd;
+	}
 	return NULL;
 }
 
@@ -558,12 +568,13 @@ struct sysfs_dirent *sysfs_find_dirent(s
  *	Pointer to sysfs_dirent if found, NULL if not.
  */
 struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
+				      const void *ns,
 				      const unsigned char *name)
 {
 	struct sysfs_dirent *sd;
 
 	mutex_lock(&sysfs_mutex);
-	sd = sysfs_find_dirent(parent_sd, name);
+	sd = sysfs_find_dirent(parent_sd, ns, name);
 	sysfs_get(sd);
 	mutex_unlock(&sysfs_mutex);
 
@@ -572,7 +583,8 @@ struct sysfs_dirent *sysfs_get_dirent(st
 EXPORT_SYMBOL_GPL(sysfs_get_dirent);
 
 static int create_dir(struct kobject *kobj, struct sysfs_dirent *parent_sd,
-		      const char *name, struct sysfs_dirent **p_sd)
+	enum kobj_ns_type type, const void *ns, const char *name,
+	struct sysfs_dirent **p_sd)
 {
 	umode_t mode = S_IFDIR| S_IRWXU | S_IRUGO | S_IXUGO;
 	struct sysfs_addrm_cxt acxt;
@@ -583,6 +595,9 @@ static int create_dir(struct kobject *ko
 	sd = sysfs_new_dirent(name, mode, SYSFS_DIR);
 	if (!sd)
 		return -ENOMEM;
+
+	sd->s_flags |= (type << SYSFS_NS_TYPE_SHIFT);
+	sd->s_ns = ns;
 	sd->s_dir.kobj = kobj;
 
 	/* link in */
@@ -601,7 +616,25 @@ static int create_dir(struct kobject *ko
 int sysfs_create_subdir(struct kobject *kobj, const char *name,
 			struct sysfs_dirent **p_sd)
 {
-	return create_dir(kobj, kobj->sd, name, p_sd);
+	return create_dir(kobj, kobj->sd,
+			  KOBJ_NS_TYPE_NONE, NULL, name, p_sd);
+}
+
+static enum kobj_ns_type sysfs_read_ns_type(struct kobject *kobj)
+{
+	const struct kobj_ns_type_operations *ops;
+	enum kobj_ns_type type;
+
+	ops = kobj_child_ns_ops(kobj);
+	if (!ops)
+		return KOBJ_NS_TYPE_NONE;
+
+	type = ops->type;
+	BUG_ON(type <= KOBJ_NS_TYPE_NONE);
+	BUG_ON(type >= KOBJ_NS_TYPES);
+	BUG_ON(!kobj_ns_type_registered(type));
+
+	return type;
 }
 
 /**
@@ -610,7 +643,9 @@ int sysfs_create_subdir(struct kobject *
  */
 int sysfs_create_dir(struct kobject * kobj)
 {
+	enum kobj_ns_type type;
 	struct sysfs_dirent *parent_sd, *sd;
+	const void *ns = NULL;
 	int error = 0;
 
 	BUG_ON(!kobj);
@@ -620,7 +655,11 @@ int sysfs_create_dir(struct kobject * ko
 	else
 		parent_sd = &sysfs_root;
 
-	error = create_dir(kobj, parent_sd, kobject_name(kobj), &sd);
+	if (sysfs_ns_type(parent_sd))
+		ns = kobj->ktype->namespace(kobj);
+	type = sysfs_read_ns_type(kobj);
+
+	error = create_dir(kobj, parent_sd, type, ns, kobject_name(kobj), &sd);
 	if (!error)
 		kobj->sd = sd;
 	return error;
@@ -630,13 +669,19 @@ static struct dentry * sysfs_lookup(stru
 				struct nameidata *nd)
 {
 	struct dentry *ret = NULL;
-	struct sysfs_dirent *parent_sd = dentry->d_parent->d_fsdata;
+	struct dentry *parent = dentry->d_parent;
+	struct sysfs_dirent *parent_sd = parent->d_fsdata;
 	struct sysfs_dirent *sd;
 	struct inode *inode;
+	enum kobj_ns_type type;
+	const void *ns;
 
 	mutex_lock(&sysfs_mutex);
 
-	sd = sysfs_find_dirent(parent_sd, dentry->d_name.name);
+	type = sysfs_ns_type(parent_sd);
+	ns = sysfs_info(dir->i_sb)->ns[type];
+
+	sd = sysfs_find_dirent(parent_sd, ns, dentry->d_name.name);
 
 	/* no such entry */
 	if (!sd) {
@@ -735,7 +780,8 @@ void sysfs_remove_dir(struct kobject * k
 }
 
 int sysfs_rename(struct sysfs_dirent *sd,
-	struct sysfs_dirent *new_parent_sd, const char *new_name)
+	struct sysfs_dirent *new_parent_sd, const void *new_ns,
+	const char *new_name)
 {
 	const char *dup_name = NULL;
 	int error;
@@ -743,12 +789,12 @@ int sysfs_rename(struct sysfs_dirent *sd
 	mutex_lock(&sysfs_mutex);
 
 	error = 0;
-	if ((sd->s_parent == new_parent_sd) &&
+	if ((sd->s_parent == new_parent_sd) && (sd->s_ns == new_ns) &&
 	    (strcmp(sd->s_name, new_name) == 0))
 		goto out;	/* nothing to rename */
 
 	error = -EEXIST;
-	if (sysfs_find_dirent(new_parent_sd, new_name))
+	if (sysfs_find_dirent(new_parent_sd, new_ns, new_name))
 		goto out;
 
 	/* rename sysfs_dirent */
@@ -770,6 +816,7 @@ int sysfs_rename(struct sysfs_dirent *sd
 		sd->s_parent = new_parent_sd;
 		sysfs_link_sibling(sd);
 	}
+	sd->s_ns = new_ns;
 
 	error = 0;
  out:
@@ -780,19 +827,28 @@ int sysfs_rename(struct sysfs_dirent *sd
 
 int sysfs_rename_dir(struct kobject *kobj, const char *new_name)
 {
-	return sysfs_rename(kobj->sd, kobj->sd->s_parent, new_name);
+	struct sysfs_dirent *parent_sd = kobj->sd->s_parent;
+	const void *new_ns = NULL;
+
+	if (sysfs_ns_type(parent_sd))
+		new_ns = kobj->ktype->namespace(kobj);
+
+	return sysfs_rename(kobj->sd, parent_sd, new_ns, new_name);
 }
 
 int sysfs_move_dir(struct kobject *kobj, struct kobject *new_parent_kobj)
 {
 	struct sysfs_dirent *sd = kobj->sd;
 	struct sysfs_dirent *new_parent_sd;
+	const void *new_ns = NULL;
 
 	BUG_ON(!sd->s_parent);
+	if (sysfs_ns_type(sd->s_parent))
+		new_ns = kobj->ktype->namespace(kobj);
 	new_parent_sd = new_parent_kobj && new_parent_kobj->sd ?
 		new_parent_kobj->sd : &sysfs_root;
 
-	return sysfs_rename(sd, new_parent_sd, sd->s_name);
+	return sysfs_rename(sd, new_parent_sd, new_ns, sd->s_name);
 }
 
 /* Relationship between s_mode and the DT_xxx types */
@@ -807,32 +863,35 @@ static int sysfs_dir_release(struct inod
 	return 0;
 }
 
-static struct sysfs_dirent *sysfs_dir_pos(struct sysfs_dirent *parent_sd,
-	ino_t ino, struct sysfs_dirent *pos)
+static struct sysfs_dirent *sysfs_dir_pos(const void *ns,
+	struct sysfs_dirent *parent_sd,	ino_t ino, struct sysfs_dirent *pos)
 {
 	if (pos) {
 		int valid = !(pos->s_flags & SYSFS_FLAG_REMOVED) &&
 			pos->s_parent == parent_sd &&
 			ino == pos->s_ino;
 		sysfs_put(pos);
-		if (valid)
-			return pos;
+		if (!valid)
+			pos = NULL;
 	}
-	pos = NULL;
-	if ((ino > 1) && (ino < INT_MAX)) {
+	if (!pos && (ino > 1) && (ino < INT_MAX)) {
 		pos = parent_sd->s_dir.children;
 		while (pos && (ino > pos->s_ino))
 			pos = pos->s_sibling;
 	}
+	while (pos && pos->s_ns != ns)
+		pos = pos->s_sibling;
 	return pos;
 }
 
-static struct sysfs_dirent *sysfs_dir_next_pos(struct sysfs_dirent *parent_sd,
-	ino_t ino, struct sysfs_dirent *pos)
+static struct sysfs_dirent *sysfs_dir_next_pos(const void *ns,
+	struct sysfs_dirent *parent_sd,	ino_t ino, struct sysfs_dirent *pos)
 {
-	pos = sysfs_dir_pos(parent_sd, ino, pos);
+	pos = sysfs_dir_pos(ns, parent_sd, ino, pos);
 	if (pos)
 		pos = pos->s_sibling;
+	while (pos && pos->s_ns != ns)
+		pos = pos->s_sibling;
 	return pos;
 }
 
@@ -841,8 +900,13 @@ static int sysfs_readdir(struct file * f
 	struct dentry *dentry = filp->f_path.dentry;
 	struct sysfs_dirent * parent_sd = dentry->d_fsdata;
 	struct sysfs_dirent *pos = filp->private_data;
+	enum kobj_ns_type type;
+	const void *ns;
 	ino_t ino;
 
+	type = sysfs_ns_type(parent_sd);
+	ns = sysfs_info(dentry->d_sb)->ns[type];
+
 	if (filp->f_pos == 0) {
 		ino = parent_sd->s_ino;
 		if (filldir(dirent, ".", 1, filp->f_pos, ino, DT_DIR) == 0)
@@ -857,9 +921,9 @@ static int sysfs_readdir(struct file * f
 			filp->f_pos++;
 	}
 	mutex_lock(&sysfs_mutex);
-	for (pos = sysfs_dir_pos(parent_sd, filp->f_pos, pos);
+	for (pos = sysfs_dir_pos(ns, parent_sd, filp->f_pos, pos);
 	     pos;
-	     pos = sysfs_dir_next_pos(parent_sd, filp->f_pos, pos)) {
+	     pos = sysfs_dir_next_pos(ns, parent_sd, filp->f_pos, pos)) {
 		const char * name;
 		unsigned int type;
 		int len, ret;
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -478,9 +478,12 @@ void sysfs_notify(struct kobject *k, con
 	mutex_lock(&sysfs_mutex);
 
 	if (sd && dir)
-		sd = sysfs_find_dirent(sd, dir);
+		/* Only directories are tagged, so no need to pass
+		 * a tag explicitly.
+		 */
+		sd = sysfs_find_dirent(sd, NULL, dir);
 	if (sd && attr)
-		sd = sysfs_find_dirent(sd, attr);
+		sd = sysfs_find_dirent(sd, NULL, attr);
 	if (sd)
 		sysfs_notify_dirent(sd);
 
@@ -569,7 +572,7 @@ int sysfs_add_file_to_group(struct kobje
 	int error;
 
 	if (group)
-		dir_sd = sysfs_get_dirent(kobj->sd, group);
+		dir_sd = sysfs_get_dirent(kobj->sd, NULL, group);
 	else
 		dir_sd = sysfs_get(kobj->sd);
 
@@ -599,7 +602,7 @@ int sysfs_chmod_file(struct kobject *kob
 	mutex_lock(&sysfs_mutex);
 
 	rc = -ENOENT;
-	sd = sysfs_find_dirent(kobj->sd, attr->name);
+	sd = sysfs_find_dirent(kobj->sd, NULL, attr->name);
 	if (!sd)
 		goto out;
 
@@ -624,7 +627,7 @@ EXPORT_SYMBOL_GPL(sysfs_chmod_file);
 
 void sysfs_remove_file(struct kobject * kobj, const struct attribute * attr)
 {
-	sysfs_hash_and_remove(kobj->sd, attr->name);
+	sysfs_hash_and_remove(kobj->sd, NULL, attr->name);
 }
 
 void sysfs_remove_files(struct kobject * kobj, const struct attribute **ptr)
@@ -646,11 +649,11 @@ void sysfs_remove_file_from_group(struct
 	struct sysfs_dirent *dir_sd;
 
 	if (group)
-		dir_sd = sysfs_get_dirent(kobj->sd, group);
+		dir_sd = sysfs_get_dirent(kobj->sd, NULL, group);
 	else
 		dir_sd = sysfs_get(kobj->sd);
 	if (dir_sd) {
-		sysfs_hash_and_remove(dir_sd, attr->name);
+		sysfs_hash_and_remove(dir_sd, NULL, attr->name);
 		sysfs_put(dir_sd);
 	}
 }
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -23,7 +23,7 @@ static void remove_files(struct sysfs_di
 	int i;
 
 	for (i = 0, attr = grp->attrs; *attr; i++, attr++)
-		sysfs_hash_and_remove(dir_sd, (*attr)->name);
+		sysfs_hash_and_remove(dir_sd, NULL, (*attr)->name);
 }
 
 static int create_files(struct sysfs_dirent *dir_sd, struct kobject *kobj,
@@ -39,7 +39,7 @@ static int create_files(struct sysfs_dir
 		 * visibility.  Do this by first removing then
 		 * re-adding (if required) the file */
 		if (update)
-			sysfs_hash_and_remove(dir_sd, (*attr)->name);
+			sysfs_hash_and_remove(dir_sd, NULL, (*attr)->name);
 		if (grp->is_visible) {
 			mode = grp->is_visible(kobj, *attr, i);
 			if (!mode)
@@ -132,7 +132,7 @@ void sysfs_remove_group(struct kobject *
 	struct sysfs_dirent *sd;
 
 	if (grp->name) {
-		sd = sysfs_get_dirent(dir_sd, grp->name);
+		sd = sysfs_get_dirent(dir_sd, NULL, grp->name);
 		if (!sd) {
 			WARN(!sd, KERN_WARNING "sysfs group %p not found for "
 				"kobject '%s'\n", grp, kobject_name(kobj));
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -324,7 +324,7 @@ void sysfs_delete_inode(struct inode *in
 	sysfs_put(sd);
 }
 
-int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name)
+int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const void *ns, const char *name)
 {
 	struct sysfs_addrm_cxt acxt;
 	struct sysfs_dirent *sd;
@@ -334,7 +334,7 @@ int sysfs_hash_and_remove(struct sysfs_d
 
 	sysfs_addrm_start(&acxt, dir_sd);
 
-	sd = sysfs_find_dirent(dir_sd, name);
+	sd = sysfs_find_dirent(dir_sd, ns, name);
 	if (sd)
 		sysfs_remove_one(&acxt, sd);
 
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -35,7 +35,7 @@ static const struct super_operations sys
 struct sysfs_dirent sysfs_root = {
 	.s_name		= "",
 	.s_count	= ATOMIC_INIT(1),
-	.s_flags	= SYSFS_DIR,
+	.s_flags	= SYSFS_DIR | (KOBJ_NS_TYPE_NONE << SYSFS_NS_TYPE_SHIFT),
 	.s_mode		= S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
 	.s_ino		= 1,
 };
@@ -76,7 +76,13 @@ static int sysfs_test_super(struct super
 {
 	struct sysfs_super_info *sb_info = sysfs_info(sb);
 	struct sysfs_super_info *info = data;
+	enum kobj_ns_type type;
 	int found = 1;
+
+	for (type = KOBJ_NS_TYPE_NONE; type < KOBJ_NS_TYPES; type++) {
+		if (sb_info->ns[type] != info->ns[type])
+			found = 0;
+	}
 	return found;
 }
 
@@ -93,6 +99,7 @@ static int sysfs_get_sb(struct file_syst
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
 {
 	struct sysfs_super_info *info;
+	enum kobj_ns_type type;
 	struct super_block *sb;
 	int error;
 
@@ -100,6 +107,10 @@ static int sysfs_get_sb(struct file_syst
 	info = kzalloc(sizeof(*info), GFP_KERNEL);
 	if (!info)
 		goto out;
+
+	for (type = KOBJ_NS_TYPE_NONE; type < KOBJ_NS_TYPES; type++)
+		info->ns[type] = kobj_ns_current(type);
+
 	sb = sget(fs_type, sysfs_test_super, sysfs_set_super, info);
 	if (IS_ERR(sb) || sb->s_fs_info != info)
 		kfree(info);
@@ -137,6 +148,26 @@ static struct file_system_type sysfs_fs_
 	.kill_sb	= sysfs_kill_sb,
 };
 
+void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
+{
+	struct super_block *sb;
+
+	mutex_lock(&sysfs_mutex);
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
+		struct sysfs_super_info *info = sysfs_info(sb);
+		/* Ignore superblocks that are in the process of unmounting */
+		if (sb->s_count <= S_BIAS)
+			continue;
+		/* Ignore superblocks with the wrong ns */
+		if (info->ns[type] != ns)
+			continue;
+		info->ns[type] = NULL;
+	}
+	spin_unlock(&sb_lock);
+	mutex_unlock(&sysfs_mutex);
+}
+
 int __init sysfs_init(void)
 {
 	int err = -ENOMEM;
--- a/fs/sysfs/symlink.c
+++ b/fs/sysfs/symlink.c
@@ -58,6 +58,8 @@ static int sysfs_do_create_link(struct k
 	if (!sd)
 		goto out_put;
 
+	if (sysfs_ns_type(parent_sd))
+		sd->s_ns = target->ktype->namespace(target);
 	sd->s_symlink.target_sd = target_sd;
 	target_sd = NULL;	/* reference is now owned by the symlink */
 
@@ -121,7 +123,7 @@ void sysfs_remove_link(struct kobject *
 	else
 		parent_sd = kobj->sd;
 
-	sysfs_hash_and_remove(parent_sd, name);
+	sysfs_hash_and_remove(parent_sd, NULL, name);
 }
 
 /**
@@ -137,6 +139,7 @@ int sysfs_rename_link(struct kobject *ko
 			const char *old, const char *new)
 {
 	struct sysfs_dirent *parent_sd, *sd = NULL;
+	const void *old_ns = NULL, *new_ns = NULL;
 	int result;
 
 	if (!kobj)
@@ -144,8 +147,11 @@ int sysfs_rename_link(struct kobject *ko
 	else
 		parent_sd = kobj->sd;
 
+	if (targ->sd)
+		old_ns = targ->sd->s_ns;
+
 	result = -ENOENT;
-	sd = sysfs_get_dirent(parent_sd, old);
+	sd = sysfs_get_dirent(parent_sd, old_ns, old);
 	if (!sd)
 		goto out;
 
@@ -155,7 +161,10 @@ int sysfs_rename_link(struct kobject *ko
 	if (sd->s_symlink.target_sd->s_dir.kobj != targ)
 		goto out;
 
-	result = sysfs_rename(sd, parent_sd, new);
+	if (sysfs_ns_type(parent_sd))
+		new_ns = targ->ktype->namespace(targ);
+
+	result = sysfs_rename(sd, parent_sd, new_ns, new);
 
 out:
 	sysfs_put(sd);
--- a/fs/sysfs/sysfs.h
+++ b/fs/sysfs/sysfs.h
@@ -58,6 +58,7 @@ struct sysfs_dirent {
 	struct sysfs_dirent	*s_sibling;
 	const char		*s_name;
 
+	const void		*s_ns;
 	union {
 		struct sysfs_elem_dir		s_dir;
 		struct sysfs_elem_symlink	s_symlink;
@@ -81,14 +82,22 @@ struct sysfs_dirent {
 #define SYSFS_COPY_NAME			(SYSFS_DIR | SYSFS_KOBJ_LINK)
 #define SYSFS_ACTIVE_REF		(SYSFS_KOBJ_ATTR | SYSFS_KOBJ_BIN_ATTR)
 
-#define SYSFS_FLAG_MASK			~SYSFS_TYPE_MASK
-#define SYSFS_FLAG_REMOVED		0x0200
+#define SYSFS_NS_TYPE_MASK		0xff00
+#define SYSFS_NS_TYPE_SHIFT		8
+
+#define SYSFS_FLAG_MASK			~(SYSFS_NS_TYPE_MASK|SYSFS_TYPE_MASK)
+#define SYSFS_FLAG_REMOVED		0x020000
 
 static inline unsigned int sysfs_type(struct sysfs_dirent *sd)
 {
 	return sd->s_flags & SYSFS_TYPE_MASK;
 }
 
+static inline enum kobj_ns_type sysfs_ns_type(struct sysfs_dirent *sd)
+{
+	return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;
+}
+
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 #define sysfs_dirent_init_lockdep(sd)				\
 do {								\
@@ -115,6 +124,7 @@ struct sysfs_addrm_cxt {
  * mount.c
  */
 struct sysfs_super_info {
+	const void *ns[KOBJ_NS_TYPES];
 };
 #define sysfs_info(SB) ((struct sysfs_super_info *)(SB->s_fs_info))
 extern struct sysfs_dirent sysfs_root;
@@ -140,8 +150,10 @@ void sysfs_remove_one(struct sysfs_addrm
 void sysfs_addrm_finish(struct sysfs_addrm_cxt *acxt);
 
 struct sysfs_dirent *sysfs_find_dirent(struct sysfs_dirent *parent_sd,
+				       const void *ns,
 				       const unsigned char *name);
 struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
+				      const void *ns,
 				      const unsigned char *name);
 struct sysfs_dirent *sysfs_new_dirent(const char *name, umode_t mode, int type);
 
@@ -152,7 +164,7 @@ int sysfs_create_subdir(struct kobject *
 void sysfs_remove_subdir(struct sysfs_dirent *sd);
 
 int sysfs_rename(struct sysfs_dirent *sd,
-	struct sysfs_dirent *new_parent_sd, const char *new_name);
+	struct sysfs_dirent *new_parent_sd, const void *ns, const char *new_name);
 
 static inline struct sysfs_dirent *__sysfs_get(struct sysfs_dirent *sd)
 {
@@ -182,7 +194,7 @@ int sysfs_setattr(struct dentry *dentry,
 int sysfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat);
 int sysfs_setxattr(struct dentry *dentry, const char *name, const void *value,
 		size_t size, int flags);
-int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const char *name);
+int sysfs_hash_and_remove(struct sysfs_dirent *dir_sd, const void *ns, const char *name);
 int sysfs_inode_init(void);
 
 /*
--- a/include/linux/sysfs.h
+++ b/include/linux/sysfs.h
@@ -20,6 +20,7 @@
 
 struct kobject;
 struct module;
+enum kobj_ns_type;
 
 /* FIXME
  * The *owner field is no longer used.
@@ -168,10 +169,14 @@ void sysfs_remove_file_from_group(struct
 void sysfs_notify(struct kobject *kobj, const char *dir, const char *attr);
 void sysfs_notify_dirent(struct sysfs_dirent *sd);
 struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
+				      const void *ns,
 				      const unsigned char *name);
 struct sysfs_dirent *sysfs_get(struct sysfs_dirent *sd);
 void sysfs_put(struct sysfs_dirent *sd);
 void sysfs_printk_last_file(void);
+
+void sysfs_exit_ns(enum kobj_ns_type type, const void *tag);
+
 int __must_check sysfs_init(void);
 
 #else /* CONFIG_SYSFS */
@@ -301,6 +306,7 @@ static inline void sysfs_notify_dirent(s
 }
 static inline
 struct sysfs_dirent *sysfs_get_dirent(struct sysfs_dirent *parent_sd,
+				      const void *ns,
 				      const unsigned char *name)
 {
 	return NULL;
@@ -313,6 +319,10 @@ static inline void sysfs_put(struct sysf
 {
 }
 
+static inline void sysfs_exit_ns(enum kobj_ns_type type, const void *tag)
+{
+}
+
 static inline int __must_check sysfs_init(void)
 {
 	return 0;
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -950,6 +950,7 @@ const void *kobj_ns_initial(enum kobj_ns
 
 void kobj_ns_exit(enum kobj_ns_type type, const void *ns)
 {
+	sysfs_exit_ns(type, ns);
 }
 
 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-29 20:29   ` patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree gregkh
@ 2010-04-30  4:18     ` Tejun Heo
  2010-04-30  4:45       ` Greg KH
  0 siblings, 1 reply; 83+ messages in thread
From: Tejun Heo @ 2010-04-30  4:18 UTC (permalink / raw)
  To: gregkh
  Cc: ebiederm, bcrl, benjamin.thery, cornelia.huck, eric.dumazet,
	kay.sievers, netdev, serue

On 04/29/2010 10:29 PM, gregkh@suse.de wrote:
> 
> This is a note to let you know that I've just added the patch titled
> 
>     Subject: sysfs: Implement sysfs tagged directory support.
> 
> to my gregkh-2.6 tree.  Its filename is
> 
>     sysfs-implement-sysfs-tagged-directory-support.patch
> 
> This tree can be found at 
>     http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/

I wish at least more comments are added before it goes mainline.  I
don't really understand the current form.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-30  4:18     ` Tejun Heo
@ 2010-04-30  4:45       ` Greg KH
  2010-04-30  5:24         ` Eric W. Biederman
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2010-04-30  4:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: ebiederm, bcrl, benjamin.thery, cornelia.huck, eric.dumazet,
	kay.sievers, netdev, serue

On Fri, Apr 30, 2010 at 06:18:53AM +0200, Tejun Heo wrote:
> On 04/29/2010 10:29 PM, gregkh@suse.de wrote:
> > 
> > This is a note to let you know that I've just added the patch titled
> > 
> >     Subject: sysfs: Implement sysfs tagged directory support.
> > 
> > to my gregkh-2.6 tree.  Its filename is
> > 
> >     sysfs-implement-sysfs-tagged-directory-support.patch
> > 
> > This tree can be found at 
> >     http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/
> 
> I wish at least more comments are added before it goes mainline.  I
> don't really understand the current form.

Ok, that's fine with me, I'll pull it back out.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-30  4:45       ` Greg KH
@ 2010-04-30  5:24         ` Eric W. Biederman
  2010-04-30  5:37           ` Tejun Heo
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-04-30  5:24 UTC (permalink / raw)
  To: Greg KH
  Cc: Tejun Heo, bcrl, benjamin.thery, cornelia.huck, eric.dumazet,
	kay.sievers, netdev, serue

Greg KH <gregkh@suse.de> writes:

> On Fri, Apr 30, 2010 at 06:18:53AM +0200, Tejun Heo wrote:
>> On 04/29/2010 10:29 PM, gregkh@suse.de wrote:
>> > 
>> > This is a note to let you know that I've just added the patch titled
>> > 
>> >     Subject: sysfs: Implement sysfs tagged directory support.
>> > 
>> > to my gregkh-2.6 tree.  Its filename is
>> > 
>> >     sysfs-implement-sysfs-tagged-directory-support.patch
>> > 
>> > This tree can be found at 
>> >     http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/
>> 
>> I wish at least more comments are added before it goes mainline.  I
>> don't really understand the current form.
>
> Ok, that's fine with me, I'll pull it back out.

?????

Tejun you have offered nothing constructive to the review, except looking
and saying you don't understand what is going on.

I have a tree posted with all of my code.  I have given snippets of the
pieces yet to be merged, and still your reaction is you don't understand
please break it down for you in itty-bitty little pieces, that you don't
need to think about it, to understand it.

Tejun I think for the code to make any sense to you I would need to rip
out out and/or rewrite the kobject layer, and possible the device
model code as well.

Tejun I'm sorry you can't understand the code, and I'm sorry the code
may be over-general.  In part that is because making the code
over-general is what you asked for when reviewing it the first time.


Greg I have not gotten any constructive feedback.  Not a specific
please fix/or comment a specific thing.  Not a comment that
says something is a bug and just wrong.  The closest I have gotten
is a request to make the code even more complicated and intrusive,
and harder to keep correct by adding an ns member to kobjects
which comes to 3 copies of the same state for the same objects
which ultimately is more difficult to keep in sync.

I am more than happy to improve the code, but at this point I really
think the code needs to be merged so people are forced to deal with
it, instead of saying "I don't understand the code" in the review and
blocking the merge.  I don't think the code will improve any more by
being out of tree.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-30  5:24         ` Eric W. Biederman
@ 2010-04-30  5:37           ` Tejun Heo
  2010-04-30  6:12             ` Tejun Heo
  2010-04-30 14:29             ` Serge E. Hallyn
  0 siblings, 2 replies; 83+ messages in thread
From: Tejun Heo @ 2010-04-30  5:37 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, bcrl, benjamin.thery, cornelia.huck, eric.dumazet,
	kay.sievers, netdev, serue

Hello,

On 04/30/2010 07:24 AM, Eric W. Biederman wrote:
>>> I wish at least more comments are added before it goes mainline.  I
>>> don't really understand the current form.
>>
>> Ok, that's fine with me, I'll pull it back out.
> 
> ?????
> 
> Tejun you have offered nothing constructive to the review, except looking
> and saying you don't understand what is going on.

Eric, no need to get too touchy and you're right in part in saying all
I'm saying is basically "I don't understand it" which is the same
reason why I'm not nacking it and explicitly stated that I would be
okay with the series going in if Greg/Kay would be okay with it.
Again, about the same thing with the above comment, I was *wishing*
for more comments *before it goes mainline*.

> Tejun I think for the code to make any sense to you I would need to rip
> out out and/or rewrite the kobject layer, and possible the device
> model code as well.

And yes, in the long run, please do that.

> Tejun I'm sorry you can't understand the code, and I'm sorry the code
> may be over-general.  In part that is because making the code
> over-general is what you asked for when reviewing it the first time.

Please give me some credit.  I mean that the code is difficult to
follow and justify when I say I don't understand it.  Yeah, I tried to
understand it and I think I understand how it *works* in its current
form but I just don't think the design is justified or logical.  You
say it's infeasible to do it in straightforward manner in reasonable
amount of time and that's why I neither acked or nacked the series and
deferred the decision to the subsystem maintainer.

But, at the very least, please add some comments.  Try to explain what
each callbacks are supposed to do and why they're there.  Not everyone
lives in your head.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-30  5:37           ` Tejun Heo
@ 2010-04-30  6:12             ` Tejun Heo
  2010-04-30 14:29             ` Serge E. Hallyn
  1 sibling, 0 replies; 83+ messages in thread
From: Tejun Heo @ 2010-04-30  6:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, bcrl, benjamin.thery, cornelia.huck, eric.dumazet,
	kay.sievers, netdev, serue

Hello,

On 04/30/2010 07:37 AM, Tejun Heo wrote:
>> Tejun I think for the code to make any sense to you I would need to rip
>> out out and/or rewrite the kobject layer, and possible the device
>> model code as well.
> 
> And yes, in the long run, please do that.

Let me add a little bit here just in case.

IIRC, the initial sysfs tag support wasn't too different from the
interface side but the implementation inside sysfs was very hacky, so
I complained on both accounts.  You did a wonderful job of
restructuing sysfs so that it basically behaves as a distributed file
system and exposing different subsets is not too hacky and doesn't
violate layering.  Your work there was admirable and much better than
what I had in mind.

I don't think the hooking part from NSes would be as major an overhaul
as you're suggesting above.  I haven't tried it so I can't tell with
certainty (again so no nack) but I just can't imagine it being that
difficult or major after all the things you've done in the area and
was hoping that you repost something which is more digestible soonish.

I'm sorry that I couldn't be more specific but I'm almost sure you'll
be able to come up with something better than what I can think of.
So, yeah, please,

* Add comments to the current implementation.  If for nothing else,
  for cases where you get sick of it for some time and other people
  have to work on it before you feel like coming back.

* While doing that, please add "FIXME: or TODO:" comments describing
  what would be a better direction in the long run.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-30  5:37           ` Tejun Heo
  2010-04-30  6:12             ` Tejun Heo
@ 2010-04-30 14:29             ` Serge E. Hallyn
  2010-04-30 15:22               ` Tejun Heo
  1 sibling, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-04-30 14:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Eric W. Biederman, Greg KH, bcrl, benjamin.thery, cornelia.huck,
	eric.dumazet, kay.sievers, netdev

Quoting Tejun Heo (tj@kernel.org):
> Hello,
> 
> On 04/30/2010 07:24 AM, Eric W. Biederman wrote:
> >>> I wish at least more comments are added before it goes mainline.  I
> >>> don't really understand the current form.
> >>
> >> Ok, that's fine with me, I'll pull it back out.
> > 
> > ?????
> > 
> > Tejun you have offered nothing constructive to the review, except looking
> > and saying you don't understand what is going on.
> 
> Eric, no need to get too touchy and you're right in part in saying all
> I'm saying is basically "I don't understand it" which is the same
> reason why I'm not nacking it and explicitly stated that I would be
> okay with the series going in if Greg/Kay would be okay with it.
> Again, about the same thing with the above comment, I was *wishing*
> for more comments *before it goes mainline*.

I'm not sure if you mean "more in-line comments" or more discussion.  If
you mean the latter, then I think the patch intro was deceptive as it has
gotten more acks than that - I acked the whole set, and I think it didn't
get more discussion than it did because it got much discussion in previous
versions.

This subset looks a bit mysterious because it offers the support for
tagged /sys/class/net, but that implementation, which clarifies why some
of this is done, comes in the later patches in Eric's set.  Can you please
jump to his tree, take a look at 
http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/ebiederm/linux-2.6.32-rc5-sysfs-enhancements.git;a=commit;h=e7468796a9756b28e0ab38eb021025bbd3712823
and let us know if that does not clarify?

Hmm, but looking back over the previous thread (Mar 31) I guess you
mean more in-line comments around the callbacks, presumably things
like class_dir_child_ns_type() and struct kobj_ns_type_operations
members?

It sounds like what you'd really like is to have any explicit mention
to namespaces pulled out of drivers/base (layering as you keep saying)?
But will there be a use for this outside of namespaces?  Does trying to
anticipate that fall into the category of over-abstraction?

-serge

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-30 14:29             ` Serge E. Hallyn
@ 2010-04-30 15:22               ` Tejun Heo
  2010-04-30 15:43                 ` Serge E. Hallyn
  0 siblings, 1 reply; 83+ messages in thread
From: Tejun Heo @ 2010-04-30 15:22 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, Greg KH, bcrl, benjamin.thery, cornelia.huck,
	eric.dumazet, kay.sievers, netdev

Hello,

On 04/30/2010 04:29 PM, Serge E. Hallyn wrote:
> Hmm, but looking back over the previous thread (Mar 31) I guess you
> mean more in-line comments around the callbacks, presumably things
> like class_dir_child_ns_type() and struct kobj_ns_type_operations
> members?

In-line.  What they're, how they're supposed to be used, which calling
context is expected, what can be returned and so on.

> It sounds like what you'd really like is to have any explicit
> mention to namespaces pulled out of drivers/base (layering as you
> keep saying)?  But will there be a use for this outside of
> namespaces?  Does trying to anticipate that fall into the category
> of over-abstraction?

I wouldn't mind limited amount of layering exceptions as long as
they're clearly documented.  What I'm primarily worried about is not
the possibility of other users but more the obfuscation of the whole
sysfs-kobject-driver model thing which is already overly abstracted
and obfuscated (at least it seems to me that way).

NS needs tagged support in the driver model which in itself is fine
and I also understand that from someone who's primarily working on NS,
adding a bit on top of the whole thing wouldn't seem like much of a
problem.  To me it seems like worsening a problem which is already
pretty bad.  I hope you could understand my POV too.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-30 15:22               ` Tejun Heo
@ 2010-04-30 15:43                 ` Serge E. Hallyn
  2010-04-30 15:58                   ` Greg KH
  0 siblings, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-04-30 15:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Eric W. Biederman, Greg KH, bcrl, benjamin.thery, cornelia.huck,
	eric.dumazet, kay.sievers, netdev

Quoting Tejun Heo (tj@kernel.org):
> Hello,
> 
> On 04/30/2010 04:29 PM, Serge E. Hallyn wrote:
> > Hmm, but looking back over the previous thread (Mar 31) I guess you
> > mean more in-line comments around the callbacks, presumably things
> > like class_dir_child_ns_type() and struct kobj_ns_type_operations
> > members?
> 
> In-line.  What they're, how they're supposed to be used, which calling
> context is expected, what can be returned and so on.
> 
> > It sounds like what you'd really like is to have any explicit
> > mention to namespaces pulled out of drivers/base (layering as you
> > keep saying)?  But will there be a use for this outside of
> > namespaces?  Does trying to anticipate that fall into the category
> > of over-abstraction?
> 
> I wouldn't mind limited amount of layering exceptions as long as
> they're clearly documented.  What I'm primarily worried about is not
> the possibility of other users but more the obfuscation of the whole
> sysfs-kobject-driver model thing which is already overly abstracted
> and obfuscated (at least it seems to me that way).
> 
> NS needs tagged support in the driver model which in itself is fine
> and I also understand that from someone who's primarily working on NS,
> adding a bit on top of the whole thing wouldn't seem like much of a
> problem.  To me it seems like worsening a problem which is already
> pretty bad.  I hope you could understand my POV too.

I do.  I can take a stab monday at pushing a cloned version of Eric's
tree with comments added, if Eric doesn't have time.  (Or a patch on
top of Greg's tree)

thanks,
-serge

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree
  2010-04-30 15:43                 ` Serge E. Hallyn
@ 2010-04-30 15:58                   ` Greg KH
  0 siblings, 0 replies; 83+ messages in thread
From: Greg KH @ 2010-04-30 15:58 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Tejun Heo, Eric W. Biederman, bcrl, benjamin.thery,
	cornelia.huck, eric.dumazet, kay.sievers, netdev

On Fri, Apr 30, 2010 at 10:43:21AM -0500, Serge E. Hallyn wrote:
> Quoting Tejun Heo (tj@kernel.org):
> > Hello,
> > 
> > On 04/30/2010 04:29 PM, Serge E. Hallyn wrote:
> > > Hmm, but looking back over the previous thread (Mar 31) I guess you
> > > mean more in-line comments around the callbacks, presumably things
> > > like class_dir_child_ns_type() and struct kobj_ns_type_operations
> > > members?
> > 
> > In-line.  What they're, how they're supposed to be used, which calling
> > context is expected, what can be returned and so on.
> > 
> > > It sounds like what you'd really like is to have any explicit
> > > mention to namespaces pulled out of drivers/base (layering as you
> > > keep saying)?  But will there be a use for this outside of
> > > namespaces?  Does trying to anticipate that fall into the category
> > > of over-abstraction?
> > 
> > I wouldn't mind limited amount of layering exceptions as long as
> > they're clearly documented.  What I'm primarily worried about is not
> > the possibility of other users but more the obfuscation of the whole
> > sysfs-kobject-driver model thing which is already overly abstracted
> > and obfuscated (at least it seems to me that way).
> > 
> > NS needs tagged support in the driver model which in itself is fine
> > and I also understand that from someone who's primarily working on NS,
> > adding a bit on top of the whole thing wouldn't seem like much of a
> > problem.  To me it seems like worsening a problem which is already
> > pretty bad.  I hope you could understand my POV too.
> 
> I do.  I can take a stab monday at pushing a cloned version of Eric's
> tree with comments added, if Eric doesn't have time.  (Or a patch on
> top of Greg's tree)

On top of Greg's tree please :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 0/6] netns support in the kobject layer
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (7 preceding siblings ...)
  2010-03-31 17:21 ` Serge E. Hallyn
@ 2010-05-05  0:35 ` Eric W. Biederman
  2010-05-06 20:04   ` Greg KH
  2010-05-05  0:36 ` [PATCH 1/6] kobject: Send hotplug events in all network namespaces Eric W. Biederman
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-05  0:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, Greg KH, linux-kernel, Tejun Heo, Cornelia Huck,
	Eric Dumazet, Benjamin LaHaise, Serge Hallyn, netdev,
	David Miller


With the tagged sysfs support finally merged into Greg's tree,
it is time for the last little bits of work to get the kobject
layer and network namespaces to play together properly.

These patches are roughly evenly divided between network layer work
and sysfs layer work.  Last time this conundrum came up I believe
we decided that the easiest way to handle this was for Greg to carry
all of the patches.  David, Greg does that still make sense?

This patchset adds:
- kobject layer support for sending events in all network namespaces
- netlink support for filtering broadcast packets based on attributes
  of the destination socket.
- Enabling the network namespace support for sysfs and the kobject layer.

 include/linux/kobject.h  |    1 +
 include/linux/netlink.h  |    4 ++
 lib/kobject_uevent.c     |  108 +++++++++++++++++++++++++++++++++++++++++-----
 net/Kconfig              |    8 +++
 net/core/dev.c           |   28 ++----------
 net/core/net-sysfs.c     |   62 ++++++++++++++++++++------
 net/core/net-sysfs.h     |    1 -
 net/netlink/af_netlink.c |   21 ++++++++-
 8 files changed, 181 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 1/6] kobject: Send hotplug events in all network namespaces
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (8 preceding siblings ...)
  2010-05-05  0:35 ` [PATCH 0/6] netns support in the kobject layer Eric W. Biederman
@ 2010-05-05  0:36 ` Eric W. Biederman
  2010-05-20 18:10   ` patch kobject-send-hotplug-events-in-all-network-namespaces.patch added to gregkh-2.6 tree gregkh
  2010-05-05  0:36 ` [PATCH 2/6] netns: Teach network device kobjects which namespace they are in Eric W. Biederman
                   ` (5 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	Eric Dumazet, Benjamin LaHaise, Serge Hallyn, netdev,
	David Miller, Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Open a copy of the uevent kernel socket in each network
namespace so we can send uevents in all network namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/kobject_uevent.c |   68 ++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index c9d3a3e..3f5f17b 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -23,13 +23,19 @@
 #include <linux/skbuff.h>
 #include <linux/netlink.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 
 
 u64 uevent_seqnum;
 char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH;
 static DEFINE_SPINLOCK(sequence_lock);
-#if defined(CONFIG_NET)
-static struct sock *uevent_sock;
+#ifdef CONFIG_NET
+struct uevent_sock {
+	struct list_head list;
+	struct sock *sk;
+};
+static LIST_HEAD(uevent_sock_list);
+static DEFINE_MUTEX(uevent_sock_mutex);
 #endif
 
 /* the strings here must match the enum in include/linux/kobject.h */
@@ -99,6 +105,9 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	u64 seq;
 	int i = 0;
 	int retval = 0;
+#ifdef CONFIG_NET
+	struct uevent_sock *ue_sk;
+#endif
 
 	pr_debug("kobject: '%s' (%p): %s\n",
 		 kobject_name(kobj), kobj, __func__);
@@ -210,7 +219,9 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 
 #if defined(CONFIG_NET)
 	/* send netlink message */
-	if (uevent_sock) {
+	mutex_lock(&uevent_sock_mutex);
+	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
+		struct sock *uevent_sock = ue_sk->sk;
 		struct sk_buff *skb;
 		size_t len;
 
@@ -240,6 +251,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 		} else
 			retval = -ENOMEM;
 	}
+	mutex_unlock(&uevent_sock_mutex);
 #endif
 
 	/* call uevent_helper, usually only enabled during early boot */
@@ -319,18 +331,58 @@ int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...)
 EXPORT_SYMBOL_GPL(add_uevent_var);
 
 #if defined(CONFIG_NET)
-static int __init kobject_uevent_init(void)
+static int uevent_net_init(struct net *net)
 {
-	uevent_sock = netlink_kernel_create(&init_net, NETLINK_KOBJECT_UEVENT,
-					    1, NULL, NULL, THIS_MODULE);
-	if (!uevent_sock) {
+	struct uevent_sock *ue_sk;
+
+	ue_sk = kzalloc(sizeof(*ue_sk), GFP_KERNEL);
+	if (!ue_sk)
+		return -ENOMEM;
+
+	ue_sk->sk = netlink_kernel_create(net, NETLINK_KOBJECT_UEVENT,
+					  1, NULL, NULL, THIS_MODULE);
+	if (!ue_sk->sk) {
 		printk(KERN_ERR
 		       "kobject_uevent: unable to create netlink socket!\n");
 		return -ENODEV;
 	}
-	netlink_set_nonroot(NETLINK_KOBJECT_UEVENT, NL_NONROOT_RECV);
+	mutex_lock(&uevent_sock_mutex);
+	list_add_tail(&ue_sk->list, &uevent_sock_list);
+	mutex_unlock(&uevent_sock_mutex);
 	return 0;
 }
 
+static void uevent_net_exit(struct net *net)
+{
+	struct uevent_sock *ue_sk;
+
+	mutex_lock(&uevent_sock_mutex);
+	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
+		if (sock_net(ue_sk->sk) == net)
+			goto found;
+	}
+	mutex_unlock(&uevent_sock_mutex);
+	return;
+
+found:
+	list_del(&ue_sk->list);
+	mutex_unlock(&uevent_sock_mutex);
+
+	netlink_kernel_release(ue_sk->sk);
+	kfree(ue_sk);
+}
+
+static struct pernet_operations uevent_net_ops = {
+	.init	= uevent_net_init,
+	.exit	= uevent_net_exit,
+};
+
+static int __init kobject_uevent_init(void)
+{
+	netlink_set_nonroot(NETLINK_KOBJECT_UEVENT, NL_NONROOT_RECV);
+	return register_pernet_subsys(&uevent_net_ops);
+}
+
+
 postcore_initcall(kobject_uevent_init);
 #endif
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 2/6] netns: Teach network device kobjects which namespace they are in.
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (9 preceding siblings ...)
  2010-05-05  0:36 ` [PATCH 1/6] kobject: Send hotplug events in all network namespaces Eric W. Biederman
@ 2010-05-05  0:36 ` Eric W. Biederman
  2010-05-05 15:17   ` Serge E. Hallyn
  2010-05-20 18:10   ` patch netns-teach-network-device-kobjects-which-namespace-they-are-in.patch added to gregkh-2.6 tree gregkh
  2010-05-05  0:36 ` [PATCH 3/6] netlink: Implment netlink_broadcast_filtered Eric W. Biederman
                   ` (4 subsequent siblings)
  15 siblings, 2 replies; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	Eric Dumazet, Benjamin LaHaise, Serge Hallyn, netdev,
	David Miller, Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

The problem.  Network devices show up in sysfs and with the network
namespace active multiple devices with the same name can show up in
the same directory, ouch!

To avoid that problem and allow existing applications in network namespaces
to see the same interface that is currently presented in sysfs, this
patch enables the tagging directory support in sysfs.

By using the network namespace pointers as tags to separate out the
the sysfs directory entries we ensure that we don't have conflicts
in the directories and applications only see a limited set of
the network devices.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/kobject.h |    1 +
 net/Kconfig             |    8 ++++++++
 net/core/net-sysfs.c    |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+), 0 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index b60d2df..cf343a8 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -142,6 +142,7 @@ extern const struct sysfs_ops kobj_sysfs_ops;
  */
 enum kobj_ns_type {
 	KOBJ_NS_TYPE_NONE = 0,
+	KOBJ_NS_TYPE_NET,
 	KOBJ_NS_TYPES
 };
 
diff --git a/net/Kconfig b/net/Kconfig
index 041c35e..265e33b 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES
 
 menu "Networking options"
 
+config NET_NS
+	bool "Network namespace support"
+	default n
+	depends on EXPERIMENTAL && NAMESPACES
+	help
+	  Allow user space to create what appear to be multiple instances
+	  of the network stack.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 099c753..1b98e36 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -13,7 +13,9 @@
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
+#include <linux/nsproxy.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <net/wext.h>
@@ -466,6 +468,37 @@ static struct attribute_group wireless_group = {
 };
 #endif
 
+static const void *net_current_ns(void)
+{
+	return current->nsproxy->net_ns;
+}
+
+static const void *net_initial_ns(void)
+{
+	return &init_net;
+}
+
+static const void *net_netlink_ns(struct sock *sk)
+{
+	return sock_net(sk);
+}
+
+static struct kobj_ns_type_operations net_ns_type_operations = {
+	.type = KOBJ_NS_TYPE_NET,
+	.current_ns = net_current_ns,
+	.netlink_ns = net_netlink_ns,
+	.initial_ns = net_initial_ns,
+};
+
+static void net_kobj_ns_exit(struct net *net)
+{
+	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
+}
+
+static struct pernet_operations sysfs_net_ops = {
+	.exit = net_kobj_ns_exit,
+};
+
 #endif /* CONFIG_SYSFS */
 
 #ifdef CONFIG_HOTPLUG
@@ -506,6 +539,13 @@ static void netdev_release(struct device *d)
 	kfree((char *)dev - dev->padded);
 }
 
+static const void *net_namespace(struct device *d)
+{
+	struct net_device *dev;
+	dev = container_of(d, struct net_device, dev);
+	return dev_net(dev);
+}
+
 static struct class net_class = {
 	.name = "net",
 	.dev_release = netdev_release,
@@ -515,6 +555,8 @@ static struct class net_class = {
 #ifdef CONFIG_HOTPLUG
 	.dev_uevent = netdev_uevent,
 #endif
+	.ns_type = &net_ns_type_operations,
+	.namespace = net_namespace,
 };
 
 /* Delete sysfs entries but hold kobject reference until after all
@@ -587,5 +629,9 @@ void netdev_initialize_kobject(struct net_device *net)
 
 int netdev_kobject_init(void)
 {
+	kobj_ns_type_register(&net_ns_type_operations);
+#ifdef CONFIG_SYSFS
+	register_pernet_subsys(&sysfs_net_ops);
+#endif
 	return class_register(&net_class);
 }
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 3/6] netlink: Implment netlink_broadcast_filtered
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (10 preceding siblings ...)
  2010-05-05  0:36 ` [PATCH 2/6] netns: Teach network device kobjects which namespace they are in Eric W. Biederman
@ 2010-05-05  0:36 ` Eric W. Biederman
  2010-05-20 18:10   ` patch netlink-implment-netlink_broadcast_filtered.patch added to gregkh-2.6 tree gregkh
  2010-05-05  0:36 ` [PATCH 4/6] kobj: Send hotplug events in the proper namespace Eric W. Biederman
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	Eric Dumazet, Benjamin LaHaise, Serge Hallyn, netdev,
	David Miller, Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

When netlink sockets are used to convey data that is in a namespace
we need a way to select a subset of the listening sockets to deliver
the packet to.  For the network namespace we have been doing this
by only transmitting packets in the correct network namespace.

For data belonging to other namespaces netlink_bradcast_filtered
provides a mechanism that allows us to examine the destination
socket and to decide if we should transmit the specified packet
to it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/netlink.h  |    4 ++++
 net/netlink/af_netlink.c |   21 +++++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index fde27c0..4f7bf4b 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -188,6 +188,10 @@ extern int netlink_has_listeners(struct sock *sk, unsigned int group);
 extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 pid, int nonblock);
 extern int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 pid,
 			     __u32 group, gfp_t allocation);
+extern int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb,
+	__u32 pid, __u32 group, gfp_t allocation,
+	int (*filter)(struct sock *dsk, struct sk_buff *skb, void *data),
+	void *filter_data);
 extern void netlink_set_err(struct sock *ssk, __u32 pid, __u32 group, int code);
 extern int netlink_register_notifier(struct notifier_block *nb);
 extern int netlink_unregister_notifier(struct notifier_block *nb);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 320d042..4f16d68 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -975,6 +975,8 @@ struct netlink_broadcast_data {
 	int delivered;
 	gfp_t allocation;
 	struct sk_buff *skb, *skb2;
+	int (*tx_filter)(struct sock *dsk, struct sk_buff *skb, void *data);
+	void *tx_data;
 };
 
 static inline int do_one_broadcast(struct sock *sk,
@@ -1017,6 +1019,9 @@ static inline int do_one_broadcast(struct sock *sk,
 		p->failure = 1;
 		if (nlk->flags & NETLINK_BROADCAST_SEND_ERROR)
 			p->delivery_failure = 1;
+	} else if (p->tx_filter && p->tx_filter(sk, p->skb2, p->tx_data)) {
+		kfree_skb(p->skb2);
+		p->skb2 = NULL;
 	} else if (sk_filter(sk, p->skb2)) {
 		kfree_skb(p->skb2);
 		p->skb2 = NULL;
@@ -1035,8 +1040,10 @@ out:
 	return 0;
 }
 
-int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
-		      u32 group, gfp_t allocation)
+int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 pid,
+	u32 group, gfp_t allocation,
+	int (*filter)(struct sock *dsk, struct sk_buff *skb, void *data),
+	void *filter_data)
 {
 	struct net *net = sock_net(ssk);
 	struct netlink_broadcast_data info;
@@ -1056,6 +1063,8 @@ int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
 	info.allocation = allocation;
 	info.skb = skb;
 	info.skb2 = NULL;
+	info.tx_filter = filter;
+	info.tx_data = filter_data;
 
 	/* While we sleep in clone, do not allow to change socket list */
 
@@ -1080,6 +1089,14 @@ int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
 	}
 	return -ESRCH;
 }
+EXPORT_SYMBOL(netlink_broadcast_filtered);
+
+int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
+		      u32 group, gfp_t allocation)
+{
+	return netlink_broadcast_filtered(ssk, skb, pid, group, allocation,
+		NULL, NULL);
+}
 EXPORT_SYMBOL(netlink_broadcast);
 
 struct netlink_set_err_data {
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 4/6] kobj: Send hotplug events in the proper namespace.
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (11 preceding siblings ...)
  2010-05-05  0:36 ` [PATCH 3/6] netlink: Implment netlink_broadcast_filtered Eric W. Biederman
@ 2010-05-05  0:36 ` Eric W. Biederman
  2010-05-20 18:10   ` patch kobj-send-hotplug-events-in-the-proper-namespace.patch added to gregkh-2.6 tree gregkh
  2010-05-05  0:36 ` [PATCH 5/6] hotplug: netns aware uevent_helper Eric W. Biederman
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	Eric Dumazet, Benjamin LaHaise, Serge Hallyn, netdev,
	David Miller, Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

Utilize netlink_broacast_filtered to allow sending hotplug events
in the proper namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/kobject_uevent.c |   22 ++++++++++++++++++++--
 1 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 3f5f17b..9057ec1 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -82,6 +82,22 @@ out:
 	return ret;
 }
 
+static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
+{
+	struct kobject *kobj = data;
+	const struct kobj_ns_type_operations *ops;
+
+	ops = kobj_ns_ops(kobj);
+	if (ops) {
+		const void *sock_ns, *ns;
+		ns = kobj->ktype->namespace(kobj);
+		sock_ns = ops->netlink_ns(dsk);
+		return sock_ns != ns;
+	}
+
+	return 0;
+}
+
 /**
  * kobject_uevent_env - send an uevent with environmental data
  *
@@ -243,8 +259,10 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 			}
 
 			NETLINK_CB(skb).dst_group = 1;
-			retval = netlink_broadcast(uevent_sock, skb, 0, 1,
-						   GFP_KERNEL);
+			retval = netlink_broadcast_filtered(uevent_sock, skb,
+							    0, 1, GFP_KERNEL,
+							    kobj_bcast_filter,
+							    kobj);
 			/* ENOBUFS should be handled in userspace */
 			if (retval == -ENOBUFS)
 				retval = 0;
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 5/6] hotplug: netns aware uevent_helper
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (12 preceding siblings ...)
  2010-05-05  0:36 ` [PATCH 4/6] kobj: Send hotplug events in the proper namespace Eric W. Biederman
@ 2010-05-05  0:36 ` Eric W. Biederman
  2010-05-20 18:10   ` patch hotplug-netns-aware-uevent_helper.patch added to gregkh-2.6 tree gregkh
  2010-05-05  0:36 ` [PATCH 6/6] net: Expose all network devices in a namespaces in sysfs Eric W. Biederman
  2010-05-20 17:47 ` [PATCH 0/6] tagged sysfs support Greg KH
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	Eric Dumazet, Benjamin LaHaise, Serge Hallyn, netdev,
	David Miller, Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

It only makes sense for uevent_helper to get events
in the intial namespaces.  It's invocation is not
per namespace and it is not clear how we could make
it's invocation namespace aware.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 lib/kobject_uevent.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 9057ec1..1b3dbab 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -18,6 +18,7 @@
 #include <linux/string.h>
 #include <linux/kobject.h>
 #include <linux/module.h>
+#include <linux/user_namespace.h>
 
 #include <linux/socket.h>
 #include <linux/skbuff.h>
@@ -98,6 +99,21 @@ static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
 	return 0;
 }
 
+static int kobj_usermode_filter(struct kobject *kobj)
+{
+	const struct kobj_ns_type_operations *ops;
+
+	ops = kobj_ns_ops(kobj);
+	if (ops) {
+		const void *init_ns, *ns;
+		ns = kobj->ktype->namespace(kobj);
+		init_ns = ops->initial_ns();
+		return ns != init_ns;
+	}
+
+	return 0;
+}
+
 /**
  * kobject_uevent_env - send an uevent with environmental data
  *
@@ -273,7 +289,7 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 #endif
 
 	/* call uevent_helper, usually only enabled during early boot */
-	if (uevent_helper[0]) {
+	if (uevent_helper[0] && !kobj_usermode_filter(kobj)) {
 		char *argv [3];
 
 		argv [0] = uevent_helper;
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH 6/6] net: Expose all network devices in a namespaces in sysfs
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (13 preceding siblings ...)
  2010-05-05  0:36 ` [PATCH 5/6] hotplug: netns aware uevent_helper Eric W. Biederman
@ 2010-05-05  0:36 ` Eric W. Biederman
  2010-05-20 18:10   ` patch net-expose-all-network-devices-in-a-namespaces-in-sysfs.patch added to gregkh-2.6 tree gregkh
  2010-05-20 17:47 ` [PATCH 0/6] tagged sysfs support Greg KH
  15 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-05  0:36 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kay Sievers, linux-kernel, Tejun Heo, Cornelia Huck,
	Eric Dumazet, Benjamin LaHaise, Serge Hallyn, netdev,
	David Miller, Eric W. Biederman

From: Eric W. Biederman <ebiederm@xmission.com>

This reverts commit aaf8cdc34ddba08122f02217d9d684e2f9f5d575.

Drivers like the ipw2100 call device_create_group when they
are initialized and device_remove_group when they are shutdown.
Moving them between namespaces deletes their sysfs groups early.

In particular the following call chain results.
netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
With sysfs_remove_dir recursively deleting all of it's subdirectories,
and nothing adding them back.

Ouch!

Therefore we need to call something that ultimate calls sysfs_mv_dir
as that sysfs function can move sysfs directories between namespaces
without deleting their subdirectories or their contents.   Allowing
us to avoid placing extra boiler plate into every driver that does
something interesting with sysfs.

Currently the function that provides that capability is device_rename.
That is the code works without nasty side effects as originally written.

So remove the misguided fix for moving devices between namespaces.  The
bug in the kobject layer that inspired it has now been recognized and
fixed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/dev.c       |   28 +++++-----------------------
 net/core/net-sysfs.c |   16 +---------------
 net/core/net-sysfs.h |    1 -
 3 files changed, 6 insertions(+), 39 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index bcc490c..fa54819 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -983,15 +983,10 @@ int dev_change_name(struct net_device *dev, const char *newname)
 		return err;
 
 rollback:
-	/* For now only devices in the initial network namespace
-	 * are in sysfs.
-	 */
-	if (net_eq(net, &init_net)) {
-		ret = device_rename(&dev->dev, dev->name);
-		if (ret) {
-			memcpy(dev->name, oldname, IFNAMSIZ);
-			return ret;
-		}
+	ret = device_rename(&dev->dev, dev->name);
+	if (ret) {
+		memcpy(dev->name, oldname, IFNAMSIZ);
+		return ret;
 	}
 
 	write_lock_bh(&dev_base_lock);
@@ -5106,8 +5101,6 @@ int register_netdevice(struct net_device *dev)
 	if (dev->features & NETIF_F_SG)
 		dev->features |= NETIF_F_GSO;
 
-	netdev_initialize_kobject(dev);
-
 	ret = call_netdevice_notifiers(NETDEV_POST_INIT, dev);
 	ret = notifier_to_errno(ret);
 	if (ret)
@@ -5628,15 +5621,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	if (dev->features & NETIF_F_NETNS_LOCAL)
 		goto out;
 
-#ifdef CONFIG_SYSFS
-	/* Don't allow real devices to be moved when sysfs
-	 * is enabled.
-	 */
-	err = -EINVAL;
-	if (dev->dev.parent)
-		goto out;
-#endif
-
 	/* Ensure the device has been registrered */
 	err = -EINVAL;
 	if (dev->reg_state != NETREG_REGISTERED)
@@ -5687,8 +5671,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	dev_unicast_flush(dev);
 	dev_addr_discard(dev);
 
-	netdev_unregister_kobject(dev);
-
 	/* Actually switch the network namespace */
 	dev_net_set(dev, net);
 
@@ -5701,7 +5683,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	}
 
 	/* Fixup kobjects */
-	err = netdev_register_kobject(dev);
+	err = device_rename(&dev->dev, dev->name);
 	WARN_ON(err);
 
 	/* Add the device back in the hashes */
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1b98e36..0727c57 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -507,9 +507,6 @@ static int netdev_uevent(struct device *d, struct kobj_uevent_env *env)
 	struct net_device *dev = to_net_dev(d);
 	int retval;
 
-	if (!net_eq(dev_net(dev), &init_net))
-		return 0;
-
 	/* pass interface to uevent. */
 	retval = add_uevent_var(env, "INTERFACE=%s", dev->name);
 	if (retval)
@@ -568,9 +565,6 @@ void netdev_unregister_kobject(struct net_device * net)
 
 	kobject_get(&dev->kobj);
 
-	if (!net_eq(dev_net(net), &init_net))
-		return;
-
 	device_del(dev);
 }
 
@@ -580,6 +574,7 @@ int netdev_register_kobject(struct net_device *net)
 	struct device *dev = &(net->dev);
 	const struct attribute_group **groups = net->sysfs_groups;
 
+	device_initialize(dev);
 	dev->class = &net_class;
 	dev->platform_data = net;
 	dev->groups = groups;
@@ -602,9 +597,6 @@ int netdev_register_kobject(struct net_device *net)
 #endif
 #endif /* CONFIG_SYSFS */
 
-	if (!net_eq(dev_net(net), &init_net))
-		return 0;
-
 	return device_add(dev);
 }
 
@@ -621,12 +613,6 @@ void netdev_class_remove_file(struct class_attribute *class_attr)
 EXPORT_SYMBOL(netdev_class_create_file);
 EXPORT_SYMBOL(netdev_class_remove_file);
 
-void netdev_initialize_kobject(struct net_device *net)
-{
-	struct device *device = &(net->dev);
-	device_initialize(device);
-}
-
 int netdev_kobject_init(void)
 {
 	kobj_ns_type_register(&net_ns_type_operations);
diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
index 14e7524..805555e 100644
--- a/net/core/net-sysfs.h
+++ b/net/core/net-sysfs.h
@@ -4,5 +4,4 @@
 int netdev_kobject_init(void);
 int netdev_register_kobject(struct net_device *);
 void netdev_unregister_kobject(struct net_device *);
-void netdev_initialize_kobject(struct net_device *);
 #endif
-- 
1.6.5.2.143.g8cc62


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 2/6] netns: Teach network device kobjects which namespace they are in.
  2010-05-05  0:36 ` [PATCH 2/6] netns: Teach network device kobjects which namespace they are in Eric W. Biederman
@ 2010-05-05 15:17   ` Serge E. Hallyn
  2010-05-05 19:56     ` Eric W. Biederman
  2010-05-20 18:10   ` patch netns-teach-network-device-kobjects-which-namespace-they-are-in.patch added to gregkh-2.6 tree gregkh
  1 sibling, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-05-05 15:17 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, Eric Dumazet, Benjamin LaHaise, netdev,
	David Miller

Quoting Eric W. Biederman (ebiederm@xmission.com):
> diff --git a/net/Kconfig b/net/Kconfig
> index 041c35e..265e33b 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES
> 
>  menu "Networking options"
> 
> +config NET_NS
> +	bool "Network namespace support"
> +	default n
> +	depends on EXPERIMENTAL && NAMESPACES
> +	help
> +	  Allow user space to create what appear to be multiple instances
> +	  of the network stack.
> +

Hi Eric,

I'm confused - NET_NS is defined in init/Kconfig right now.  Is the tree
you're working from very different from mine, or is this the unfortunate
rekult of the patches sitting so long?

>  source "net/packet/Kconfig"
>  source "net/unix/Kconfig"
>  source "net/xfrm/Kconfig"
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 099c753..1b98e36 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -13,7 +13,9 @@
>  #include <linux/kernel.h>
>  #include <linux/netdevice.h>
>  #include <linux/if_arp.h>
> +#include <linux/nsproxy.h>
>  #include <net/sock.h>
> +#include <net/net_namespace.h>
>  #include <linux/rtnetlink.h>
>  #include <linux/wireless.h>
>  #include <net/wext.h>
> @@ -466,6 +468,37 @@ static struct attribute_group wireless_group = {
>  };
>  #endif
> 
> +static const void *net_current_ns(void)
> +{
> +	return current->nsproxy->net_ns;
> +}
> +
> +static const void *net_initial_ns(void)
> +{
> +	return &init_net;
> +}
> +
> +static const void *net_netlink_ns(struct sock *sk)
> +{
> +	return sock_net(sk);
> +}
> +
> +static struct kobj_ns_type_operations net_ns_type_operations = {
> +	.type = KOBJ_NS_TYPE_NET,
> +	.current_ns = net_current_ns,
> +	.netlink_ns = net_netlink_ns,
> +	.initial_ns = net_initial_ns,
> +};
> +
> +static void net_kobj_ns_exit(struct net *net)
> +{
> +	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
> +}
> +
> +static struct pernet_operations sysfs_net_ops = {
> +	.exit = net_kobj_ns_exit,
> +};
> +
>  #endif /* CONFIG_SYSFS */

...

>  int netdev_kobject_init(void)
>  {
> +	kobj_ns_type_register(&net_ns_type_operations);
> +#ifdef CONFIG_SYSFS
> +	register_pernet_subsys(&sysfs_net_ops);
> +#endif
>  	return class_register(&net_class);

I think the kobj_ns_type_register() needs to be under
ifdef CONFIG_SYSFS as well, bc net_ns_type_operations is defined
under ifdef CONFIG_SYSFS.

-serge

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 2/6] netns: Teach network device kobjects which namespace they are in.
  2010-05-05 15:17   ` Serge E. Hallyn
@ 2010-05-05 19:56     ` Eric W. Biederman
  2010-05-05 22:01       ` Serge E. Hallyn
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-05 19:56 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, Eric Dumazet, Benjamin LaHaise, netdev,
	David Miller

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> diff --git a/net/Kconfig b/net/Kconfig
>> index 041c35e..265e33b 100644
>> --- a/net/Kconfig
>> +++ b/net/Kconfig
>> @@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES
>> 
>>  menu "Networking options"
>> 
>> +config NET_NS
>> +	bool "Network namespace support"
>> +	default n
>> +	depends on EXPERIMENTAL && NAMESPACES
>> +	help
>> +	  Allow user space to create what appear to be multiple instances
>> +	  of the network stack.
>> +
>
> Hi Eric,
>
> I'm confused - NET_NS is defined in init/Kconfig right now.  Is the tree
> you're working from very different from mine, or is this the unfortunate
> rekult of the patches sitting so long?

Old patches, nothing that complains when you make a mistake like this,
and apparently I have a blind spot in my personal code review.

At one point it was not possible to enable the network namespace until
the sysfs stuff was enabled, but things have been going on long enough
that we worked around that restriction.

>>  source "net/packet/Kconfig"
>>  source "net/unix/Kconfig"
>>  source "net/xfrm/Kconfig"
>> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
>> index 099c753..1b98e36 100644
>> --- a/net/core/net-sysfs.c
>> +++ b/net/core/net-sysfs.c
>> @@ -13,7 +13,9 @@
>>  #include <linux/kernel.h>
>>  #include <linux/netdevice.h>
>>  #include <linux/if_arp.h>
>> +#include <linux/nsproxy.h>
>>  #include <net/sock.h>
>> +#include <net/net_namespace.h>
>>  #include <linux/rtnetlink.h>
>>  #include <linux/wireless.h>
>>  #include <net/wext.h>
>> @@ -466,6 +468,37 @@ static struct attribute_group wireless_group = {
>>  };
>>  #endif
>> 
>> +static const void *net_current_ns(void)
>> +{
>> +	return current->nsproxy->net_ns;
>> +}
>> +
>> +static const void *net_initial_ns(void)
>> +{
>> +	return &init_net;
>> +}
>> +
>> +static const void *net_netlink_ns(struct sock *sk)
>> +{
>> +	return sock_net(sk);
>> +}
>> +
>> +static struct kobj_ns_type_operations net_ns_type_operations = {
>> +	.type = KOBJ_NS_TYPE_NET,
>> +	.current_ns = net_current_ns,
>> +	.netlink_ns = net_netlink_ns,
>> +	.initial_ns = net_initial_ns,
>> +};
>> +
>> +static void net_kobj_ns_exit(struct net *net)
>> +{
>> +	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
>> +}
>> +
>> +static struct pernet_operations sysfs_net_ops = {
>> +	.exit = net_kobj_ns_exit,
>> +};
>> +
>>  #endif /* CONFIG_SYSFS */
>
> ...
>
>>  int netdev_kobject_init(void)
>>  {
>> +	kobj_ns_type_register(&net_ns_type_operations);
>> +#ifdef CONFIG_SYSFS
>> +	register_pernet_subsys(&sysfs_net_ops);
>> +#endif
>>  	return class_register(&net_class);
>
> I think the kobj_ns_type_register() needs to be under
> ifdef CONFIG_SYSFS as well, bc net_ns_type_operations is defined
> under ifdef CONFIG_SYSFS.

kobj_ns_type_register should not be under CONFIG_SYSFS.  Which means
that kobj_ns_type_operations needs not to be under CONFIG_SYSFS as
well.  That you for spotting that bug.

Grr.

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 2/6] netns: Teach network device kobjects which namespace they are in.
  2010-05-05 19:56     ` Eric W. Biederman
@ 2010-05-05 22:01       ` Serge E. Hallyn
  2010-05-17  4:59         ` [PATCH 7/6] net/sysfs: Fix the bitrot in network device kobject namespace support Eric W. Biederman
  0 siblings, 1 reply; 83+ messages in thread
From: Serge E. Hallyn @ 2010-05-05 22:01 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, Eric Dumazet, Benjamin LaHaise, netdev,
	David Miller

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serue@us.ibm.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> diff --git a/net/Kconfig b/net/Kconfig
> >> index 041c35e..265e33b 100644
> >> --- a/net/Kconfig
> >> +++ b/net/Kconfig
> >> @@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES
> >> 
> >>  menu "Networking options"
> >> 
> >> +config NET_NS
> >> +	bool "Network namespace support"
> >> +	default n
> >> +	depends on EXPERIMENTAL && NAMESPACES
> >> +	help
> >> +	  Allow user space to create what appear to be multiple instances
> >> +	  of the network stack.
> >> +
> >
> > Hi Eric,
> >
> > I'm confused - NET_NS is defined in init/Kconfig right now.  Is the tree
> > you're working from very different from mine, or is this the unfortunate
> > rekult of the patches sitting so long?
> 
> Old patches, nothing that complains when you make a mistake like this,
> and apparently I have a blind spot in my personal code review.

haha, we all know about that.

> At one point it was not possible to enable the network namespace until
> the sysfs stuff was enabled, but things have been going on long enough
> that we worked around that restriction.

Yeah, I remember that, and leaving this wouldn't break anything.

> >>  int netdev_kobject_init(void)
> >>  {
> >> +	kobj_ns_type_register(&net_ns_type_operations);
> >> +#ifdef CONFIG_SYSFS
> >> +	register_pernet_subsys(&sysfs_net_ops);
> >> +#endif
> >>  	return class_register(&net_class);
> >
> > I think the kobj_ns_type_register() needs to be under
> > ifdef CONFIG_SYSFS as well, bc net_ns_type_operations is defined
> > under ifdef CONFIG_SYSFS.
> 
> kobj_ns_type_register should not be under CONFIG_SYSFS.  Which means
> that kobj_ns_type_operations needs not to be under CONFIG_SYSFS as
> well.  That you for spotting that bug.

np - outside of that,

Acked-by: Serge E. Hallyn <serue@us.ibm.com>

I saw no problems with the other patches, just don't feel qualified
to give an ack.

thanks,
-serge

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-05  0:35 ` [PATCH 0/6] netns support in the kobject layer Eric W. Biederman
@ 2010-05-06 20:04   ` Greg KH
  2010-05-16  6:26     ` David Miller
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2010-05-06 20:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, Eric Dumazet, Benjamin LaHaise, Serge Hallyn,
	netdev, David Miller

On Tue, May 04, 2010 at 05:35:54PM -0700, Eric W. Biederman wrote:
> 
> With the tagged sysfs support finally merged into Greg's tree,
> it is time for the last little bits of work to get the kobject
> layer and network namespaces to play together properly.
> 
> These patches are roughly evenly divided between network layer work
> and sysfs layer work.  Last time this conundrum came up I believe
> we decided that the easiest way to handle this was for Greg to carry
> all of the patches.  David, Greg does that still make sense?

That's fine, if I get David's ack on these.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-06 20:04   ` Greg KH
@ 2010-05-16  6:26     ` David Miller
  2010-05-17 18:11       ` Greg KH
  0 siblings, 1 reply; 83+ messages in thread
From: David Miller @ 2010-05-16  6:26 UTC (permalink / raw)
  To: greg
  Cc: ebiederm, gregkh, kay.sievers, linux-kernel, tj, cornelia.huck,
	eric.dumazet, bcrl, serue, netdev

From: Greg KH <greg@kroah.com>
Date: Thu, 6 May 2010 13:04:04 -0700

> On Tue, May 04, 2010 at 05:35:54PM -0700, Eric W. Biederman wrote:
>> 
>> With the tagged sysfs support finally merged into Greg's tree,
>> it is time for the last little bits of work to get the kobject
>> layer and network namespaces to play together properly.
>> 
>> These patches are roughly evenly divided between network layer work
>> and sysfs layer work.  Last time this conundrum came up I believe
>> we decided that the easiest way to handle this was for Greg to carry
>> all of the patches.  David, Greg does that still make sense?
> 
> That's fine, if I get David's ack on these.

Looks good to me:

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH 7/6] net/sysfs: Fix the bitrot in network device kobject namespace support
  2010-05-05 22:01       ` Serge E. Hallyn
@ 2010-05-17  4:59         ` Eric W. Biederman
  2010-05-17  5:07           ` David Miller
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-17  4:59 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, Eric Dumazet, Benjamin LaHaise, netdev,
	David Miller


I had a couple of stupid bugs in:
netns: Teach network device kobjects which namespace they are in.

- I duplicated the Kconfig for the NET_NS
- The build was broken when sysfs was not compiled in

The sysfs breakage is because after I moved the operations
for the sysfs to the kobject layer, to make things cleaner
I forgot to move the ifdefs.  Opps.

I'm not quite certain how I got introduced a second NET_NS Kconfig,
but it was probably a 3 way merge somewhere along the way that
did not notice that the NET_NS Kconfig option had mvoed and thout
that was a bug.  It probably slipped in because it used to be the
sysfs patches were the first patches in my network namespace patches.
Some things just don't go like you would expect.

Neither of these bugs actually affect anything in the common case
but they should be fixed.

Thanks to Serge for noticing they were present.

Reported-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
 net/Kconfig          |    8 --------
 net/core/net-sysfs.c |    8 +++-----
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index 265e33b..041c35e 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,14 +45,6 @@ config COMPAT_NETLINK_MESSAGES
 
 menu "Networking options"
 
-config NET_NS
-	bool "Network namespace support"
-	default n
-	depends on EXPERIMENTAL && NAMESPACES
-	help
-	  Allow user space to create what appear to be multiple instances
-	  of the network stack.
-
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 0727c57..c4c5157 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -467,6 +467,7 @@ static struct attribute_group wireless_group = {
 	.attrs = wireless_attrs,
 };
 #endif
+#endif /* CONFIG_SYSFS */
 
 static const void *net_current_ns(void)
 {
@@ -495,11 +496,10 @@ static void net_kobj_ns_exit(struct net *net)
 	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
 }
 
-static struct pernet_operations sysfs_net_ops = {
+static struct pernet_operations kobj_net_ops = {
 	.exit = net_kobj_ns_exit,
 };
 
-#endif /* CONFIG_SYSFS */
 
 #ifdef CONFIG_HOTPLUG
 static int netdev_uevent(struct device *d, struct kobj_uevent_env *env)
@@ -616,8 +616,6 @@ EXPORT_SYMBOL(netdev_class_remove_file);
 int netdev_kobject_init(void)
 {
 	kobj_ns_type_register(&net_ns_type_operations);
-#ifdef CONFIG_SYSFS
-	register_pernet_subsys(&sysfs_net_ops);
-#endif
+	register_pernet_subsys(&kobj_net_ops);
 	return class_register(&net_class);
 }
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH 7/6] net/sysfs: Fix the bitrot in network device kobject namespace support
  2010-05-17  4:59         ` [PATCH 7/6] net/sysfs: Fix the bitrot in network device kobject namespace support Eric W. Biederman
@ 2010-05-17  5:07           ` David Miller
  0 siblings, 0 replies; 83+ messages in thread
From: David Miller @ 2010-05-17  5:07 UTC (permalink / raw)
  To: ebiederm
  Cc: serue, gregkh, kay.sievers, linux-kernel, tj, cornelia.huck,
	eric.dumazet, bcrl, netdev

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sun, 16 May 2010 21:59:45 -0700

> 
> I had a couple of stupid bugs in:
> netns: Teach network device kobjects which namespace they are in.
> 
> - I duplicated the Kconfig for the NET_NS
> - The build was broken when sysfs was not compiled in
> 
> The sysfs breakage is because after I moved the operations
> for the sysfs to the kobject layer, to make things cleaner
> I forgot to move the ifdefs.  Opps.
> 
> I'm not quite certain how I got introduced a second NET_NS Kconfig,
> but it was probably a 3 way merge somewhere along the way that
> did not notice that the NET_NS Kconfig option had mvoed and thout
> that was a bug.  It probably slipped in because it used to be the
> sysfs patches were the first patches in my network namespace patches.
> Some things just don't go like you would expect.
> 
> Neither of these bugs actually affect anything in the common case
> but they should be fixed.
> 
> Thanks to Serge for noticing they were present.
> 
> Reported-by: Serge E. Hallyn <serue@us.ibm.com>
> Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-16  6:26     ` David Miller
@ 2010-05-17 18:11       ` Greg KH
  2010-05-17 20:58         ` Eric W. Biederman
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2010-05-17 18:11 UTC (permalink / raw)
  To: David Miller
  Cc: ebiederm, gregkh, kay.sievers, linux-kernel, tj, cornelia.huck,
	eric.dumazet, bcrl, serue, netdev

On Sat, May 15, 2010 at 11:26:43PM -0700, David Miller wrote:
> From: Greg KH <greg@kroah.com>
> Date: Thu, 6 May 2010 13:04:04 -0700
> 
> > On Tue, May 04, 2010 at 05:35:54PM -0700, Eric W. Biederman wrote:
> >> 
> >> With the tagged sysfs support finally merged into Greg's tree,
> >> it is time for the last little bits of work to get the kobject
> >> layer and network namespaces to play together properly.
> >> 
> >> These patches are roughly evenly divided between network layer work
> >> and sysfs layer work.  Last time this conundrum came up I believe
> >> we decided that the easiest way to handle this was for Greg to carry
> >> all of the patches.  David, Greg does that still make sense?
> > 
> > That's fine, if I get David's ack on these.
> 
> Looks good to me:
> 
> Acked-by: David S. Miller <davem@davemloft.net>

Ok.  Eric, can you resend these to me when .35-rc1 is out so I can queue
them up then to get some testing in linux-next so that they can make it
into .36?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-17 18:11       ` Greg KH
@ 2010-05-17 20:58         ` Eric W. Biederman
  2010-05-17 21:03           ` Greg KH
  0 siblings, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-17 20:58 UTC (permalink / raw)
  To: Greg KH
  Cc: David Miller, gregkh, kay.sievers, linux-kernel, tj,
	cornelia.huck, eric.dumazet, bcrl, serue, netdev

Greg KH <greg@kroah.com> writes:

> On Sat, May 15, 2010 at 11:26:43PM -0700, David Miller wrote:
>> From: Greg KH <greg@kroah.com>
>> Date: Thu, 6 May 2010 13:04:04 -0700
>> 
>> > On Tue, May 04, 2010 at 05:35:54PM -0700, Eric W. Biederman wrote:
>> >> 
>> >> With the tagged sysfs support finally merged into Greg's tree,
>> >> it is time for the last little bits of work to get the kobject
>> >> layer and network namespaces to play together properly.
>> >> 
>> >> These patches are roughly evenly divided between network layer work
>> >> and sysfs layer work.  Last time this conundrum came up I believe
>> >> we decided that the easiest way to handle this was for Greg to carry
>> >> all of the patches.  David, Greg does that still make sense?
>> > 
>> > That's fine, if I get David's ack on these.
>> 
>> Looks good to me:
>> 
>> Acked-by: David S. Miller <davem@davemloft.net>
>
> Ok.  Eric, can you resend these to me when .35-rc1 is out so I can queue
> them up then to get some testing in linux-next so that they can make it
> into .36?

Grumble.  Grumble. Grumble.

If I must I will resend these, but these patches are already in
production use, and I had them to you weeks before the merge window
closed.

Is there no way we can get these in for 2.6.35?

Eric

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-17 20:58         ` Eric W. Biederman
@ 2010-05-17 21:03           ` Greg KH
  2010-05-17 22:37             ` Eric W. Biederman
  2010-05-17 23:48             ` David Miller
  0 siblings, 2 replies; 83+ messages in thread
From: Greg KH @ 2010-05-17 21:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, David Miller, kay.sievers, linux-kernel, tj,
	cornelia.huck, eric.dumazet, bcrl, serue, netdev

On Mon, May 17, 2010 at 01:58:44PM -0700, Eric W. Biederman wrote:
> Greg KH <greg@kroah.com> writes:
> 
> > On Sat, May 15, 2010 at 11:26:43PM -0700, David Miller wrote:
> >> From: Greg KH <greg@kroah.com>
> >> Date: Thu, 6 May 2010 13:04:04 -0700
> >> 
> >> > On Tue, May 04, 2010 at 05:35:54PM -0700, Eric W. Biederman wrote:
> >> >> 
> >> >> With the tagged sysfs support finally merged into Greg's tree,
> >> >> it is time for the last little bits of work to get the kobject
> >> >> layer and network namespaces to play together properly.
> >> >> 
> >> >> These patches are roughly evenly divided between network layer work
> >> >> and sysfs layer work.  Last time this conundrum came up I believe
> >> >> we decided that the easiest way to handle this was for Greg to carry
> >> >> all of the patches.  David, Greg does that still make sense?
> >> > 
> >> > That's fine, if I get David's ack on these.
> >> 
> >> Looks good to me:
> >> 
> >> Acked-by: David S. Miller <davem@davemloft.net>
> >
> > Ok.  Eric, can you resend these to me when .35-rc1 is out so I can queue
> > them up then to get some testing in linux-next so that they can make it
> > into .36?
> 
> Grumble.  Grumble. Grumble.
> 
> If I must I will resend these, but these patches are already in
> production use, and I had them to you weeks before the merge window
> closed.

Yes, but they were not reviewed by the network maintainer until after
the merge window closed.  I already have your sysfs-namespace patches
queued up for .35, and that's a big enough change for me to feel
comfortable with at the moment.

> Is there no way we can get these in for 2.6.35?

No, sorry.  One thing at a time please.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-17 21:03           ` Greg KH
@ 2010-05-17 22:37             ` Eric W. Biederman
  2010-05-17 22:54               ` Greg KH
  2010-05-17 23:48             ` David Miller
  1 sibling, 1 reply; 83+ messages in thread
From: Eric W. Biederman @ 2010-05-17 22:37 UTC (permalink / raw)
  To: Greg KH
  Cc: Greg KH, David Miller, kay.sievers, linux-kernel, tj,
	cornelia.huck, eric.dumazet, bcrl, serue, netdev

Greg KH <gregkh@suse.de> writes:

> On Mon, May 17, 2010 at 01:58:44PM -0700, Eric W. Biederman wrote:
>> Greg KH <greg@kroah.com> writes:
>> 
>> > On Sat, May 15, 2010 at 11:26:43PM -0700, David Miller wrote:
>> >> From: Greg KH <greg@kroah.com>
>> >> Date: Thu, 6 May 2010 13:04:04 -0700
>> >> 
>> >> > On Tue, May 04, 2010 at 05:35:54PM -0700, Eric W. Biederman wrote:
>> >> >> 
>> >> >> With the tagged sysfs support finally merged into Greg's tree,
>> >> >> it is time for the last little bits of work to get the kobject
>> >> >> layer and network namespaces to play together properly.
>> >> >> 
>> >> >> These patches are roughly evenly divided between network layer work
>> >> >> and sysfs layer work.  Last time this conundrum came up I believe
>> >> >> we decided that the easiest way to handle this was for Greg to carry
>> >> >> all of the patches.  David, Greg does that still make sense?
>> >> > 
>> >> > That's fine, if I get David's ack on these.
>> >> 
>> >> Looks good to me:
>> >> 
>> >> Acked-by: David S. Miller <davem@davemloft.net>
>> >
>> > Ok.  Eric, can you resend these to me when .35-rc1 is out so I can queue
>> > them up then to get some testing in linux-next so that they can make it
>> > into .36?
>> 
>> Grumble.  Grumble. Grumble.
>> 
>> If I must I will resend these, but these patches are already in
>> production use, and I had them to you weeks before the merge window
>> closed.
>
> Yes, but they were not reviewed by the network maintainer until after
> the merge window closed.

Strictly speaking the day before but I get your point.

>  I already have your sysfs-namespace patches
> queued up for .35, and that's a big enough change for me to feel
> comfortable with at the moment.
>
>> Is there no way we can get these in for 2.6.35?
>
> No, sorry.  One thing at a time please.

Sure.

If we are going to push this last bit off until the 2.6.36 time frame
Dave, Greg mind if I flip around who I send these patches to?

The big dependency is the sysfs-namespace patches which will be in
2.6.35, and if the patches get into net-next as well as linux-next
there will be a larger number of potential testers.

Eric


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-17 22:37             ` Eric W. Biederman
@ 2010-05-17 22:54               ` Greg KH
  0 siblings, 0 replies; 83+ messages in thread
From: Greg KH @ 2010-05-17 22:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg KH, David Miller, kay.sievers, linux-kernel, tj,
	cornelia.huck, eric.dumazet, bcrl, serue, netdev

On Mon, May 17, 2010 at 03:37:22PM -0700, Eric W. Biederman wrote:
> Greg KH <gregkh@suse.de> writes:
> 
> > On Mon, May 17, 2010 at 01:58:44PM -0700, Eric W. Biederman wrote:
> >> Greg KH <greg@kroah.com> writes:
> >> 
> >> > On Sat, May 15, 2010 at 11:26:43PM -0700, David Miller wrote:
> >> >> From: Greg KH <greg@kroah.com>
> >> >> Date: Thu, 6 May 2010 13:04:04 -0700
> >> >> 
> >> >> > On Tue, May 04, 2010 at 05:35:54PM -0700, Eric W. Biederman wrote:
> >> >> >> 
> >> >> >> With the tagged sysfs support finally merged into Greg's tree,
> >> >> >> it is time for the last little bits of work to get the kobject
> >> >> >> layer and network namespaces to play together properly.
> >> >> >> 
> >> >> >> These patches are roughly evenly divided between network layer work
> >> >> >> and sysfs layer work.  Last time this conundrum came up I believe
> >> >> >> we decided that the easiest way to handle this was for Greg to carry
> >> >> >> all of the patches.  David, Greg does that still make sense?
> >> >> > 
> >> >> > That's fine, if I get David's ack on these.
> >> >> 
> >> >> Looks good to me:
> >> >> 
> >> >> Acked-by: David S. Miller <davem@davemloft.net>
> >> >
> >> > Ok.  Eric, can you resend these to me when .35-rc1 is out so I can queue
> >> > them up then to get some testing in linux-next so that they can make it
> >> > into .36?
> >> 
> >> Grumble.  Grumble. Grumble.
> >> 
> >> If I must I will resend these, but these patches are already in
> >> production use, and I had them to you weeks before the merge window
> >> closed.
> >
> > Yes, but they were not reviewed by the network maintainer until after
> > the merge window closed.
> 
> Strictly speaking the day before but I get your point.
> 
> >  I already have your sysfs-namespace patches
> > queued up for .35, and that's a big enough change for me to feel
> > comfortable with at the moment.
> >
> >> Is there no way we can get these in for 2.6.35?
> >
> > No, sorry.  One thing at a time please.
> 
> Sure.
> 
> If we are going to push this last bit off until the 2.6.36 time frame
> Dave, Greg mind if I flip around who I send these patches to?
> 
> The big dependency is the sysfs-namespace patches which will be in
> 2.6.35, and if the patches get into net-next as well as linux-next
> there will be a larger number of potential testers.

Sure, that's fine with me.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-17 21:03           ` Greg KH
  2010-05-17 22:37             ` Eric W. Biederman
@ 2010-05-17 23:48             ` David Miller
  2010-05-18  4:08               ` Greg KH
  1 sibling, 1 reply; 83+ messages in thread
From: David Miller @ 2010-05-17 23:48 UTC (permalink / raw)
  To: gregkh
  Cc: ebiederm, greg, kay.sievers, linux-kernel, tj, cornelia.huck,
	eric.dumazet, bcrl, serue, netdev

From: Greg KH <gregkh@suse.de>
Date: Mon, 17 May 2010 14:03:18 -0700

> On Mon, May 17, 2010 at 01:58:44PM -0700, Eric W. Biederman wrote:
>> Greg KH <greg@kroah.com> writes:
>> 
>> If I must I will resend these, but these patches are already in
>> production use, and I had them to you weeks before the merge window
>> closed.
> 
> Yes, but they were not reviewed by the network maintainer until after
> the merge window closed.  I already have your sysfs-namespace patches
> queued up for .35, and that's a big enough change for me to feel
> comfortable with at the moment.

Greg, this is complete bullshit.  I reviewed them last week, they are fine
and have been around forever.

Merge them in now, making them wait until 2.6.36 is completely rediculious.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-17 23:48             ` David Miller
@ 2010-05-18  4:08               ` Greg KH
  2010-05-18  4:21                 ` David Miller
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2010-05-18  4:08 UTC (permalink / raw)
  To: David Miller
  Cc: gregkh, ebiederm, kay.sievers, linux-kernel, tj, cornelia.huck,
	eric.dumazet, bcrl, serue, netdev

On Mon, May 17, 2010 at 04:48:21PM -0700, David Miller wrote:
> From: Greg KH <gregkh@suse.de>
> Date: Mon, 17 May 2010 14:03:18 -0700
> 
> > On Mon, May 17, 2010 at 01:58:44PM -0700, Eric W. Biederman wrote:
> >> Greg KH <greg@kroah.com> writes:
> >> 
> >> If I must I will resend these, but these patches are already in
> >> production use, and I had them to you weeks before the merge window
> >> closed.
> > 
> > Yes, but they were not reviewed by the network maintainer until after
> > the merge window closed.  I already have your sysfs-namespace patches
> > queued up for .35, and that's a big enough change for me to feel
> > comfortable with at the moment.
> 
> Greg, this is complete bullshit.

"complete bullshit"?  How about just a "little bullshit" :)

> I reviewed them last week, they are fine
> and have been around forever.
> 
> Merge them in now, making them wait until 2.6.36 is completely rediculious.

Ok, as they are primarily affecting your subsystem, if you don't object,
I'll queue them up to my tree tomorrow and push them to Linus within
this merge period.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] netns support in the kobject layer
  2010-05-18  4:08               ` Greg KH
@ 2010-05-18  4:21                 ` David Miller
  0 siblings, 0 replies; 83+ messages in thread
From: David Miller @ 2010-05-18  4:21 UTC (permalink / raw)
  To: greg
  Cc: gregkh, ebiederm, kay.sievers, linux-kernel, tj, cornelia.huck,
	eric.dumazet, bcrl, serue, netdev

From: Greg KH <greg@kroah.com>
Date: Mon, 17 May 2010 21:08:44 -0700

> On Mon, May 17, 2010 at 04:48:21PM -0700, David Miller wrote:
>> Greg, this is complete bullshit.
> 
> "complete bullshit"?  How about just a "little bullshit" :)

Ok, it was a small turd instead of a big one :-)

>> I reviewed them last week, they are fine
>> and have been around forever.
>> 
>> Merge them in now, making them wait until 2.6.36 is completely rediculious.
> 
> Ok, as they are primarily affecting your subsystem, if you don't object,
> I'll queue them up to my tree tomorrow and push them to Linus within
> this merge period.

Thanks.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH 0/6] tagged sysfs support
  2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
                   ` (14 preceding siblings ...)
  2010-05-05  0:36 ` [PATCH 6/6] net: Expose all network devices in a namespaces in sysfs Eric W. Biederman
@ 2010-05-20 17:47 ` Greg KH
  15 siblings, 0 replies; 83+ messages in thread
From: Greg KH @ 2010-05-20 17:47 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Greg Kroah-Hartman, Kay Sievers, linux-kernel, Tejun Heo,
	Cornelia Huck, linux-fsdevel, Eric Dumazet, Benjamin LaHaise,
	Serge Hallyn, netdev

On Tue, Mar 30, 2010 at 11:30:23AM -0700, Eric W. Biederman wrote:
> 
> The main short coming of using multiple network namespaces today
> is that only network devices for the primary network namespaces
> can be put in the kobject layer and sysfs.
> 
> This is essentially the earlier version of this patchset that was
> reviewed before, just now on top of a version of sysfs that doesn't
> need cleanup patches to support it.
> 
> I have been running these patches in some form for well over a
> year so the basics should at least be solid.  
> 
> This patchset is currently against 2.6.34-rc1.
> 
> This patchset is just the basic infrastructure a couple of more pretty
> trivial patches are needed to actually enable network namespaces to use this.
> My current plan is to send those after these patches have made it through
> review.

All queued up now.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch kobj-send-hotplug-events-in-the-proper-namespace.patch added to gregkh-2.6 tree
  2010-05-05  0:36 ` [PATCH 4/6] kobj: Send hotplug events in the proper namespace Eric W. Biederman
@ 2010-05-20 18:10   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-05-20 18:10 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, davem, eric.dumazet, gregkh,
	kay.sievers, netdev, serue


This is a note to let you know that I've just added the patch titled

    Subject: kobj: Send hotplug events in the proper namespace.

to my gregkh-2.6 tree.  Its filename is

    kobj-send-hotplug-events-in-the-proper-namespace.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu May 20 10:44:38 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue,  4 May 2010 17:36:47 -0700
Subject: kobj: Send hotplug events in the proper namespace.
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>, "Eric W. Biederman" <ebiederm@xmission.com>
Message-ID: <1273019809-16472-4-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

Utilize netlink_broacast_filtered to allow sending hotplug events
in the proper namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 lib/kobject_uevent.c |   22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -83,6 +83,22 @@ out:
 	return ret;
 }
 
+static int kobj_bcast_filter(struct sock *dsk, struct sk_buff *skb, void *data)
+{
+	struct kobject *kobj = data;
+	const struct kobj_ns_type_operations *ops;
+
+	ops = kobj_ns_ops(kobj);
+	if (ops) {
+		const void *sock_ns, *ns;
+		ns = kobj->ktype->namespace(kobj);
+		sock_ns = ops->netlink_ns(dsk);
+		return sock_ns != ns;
+	}
+
+	return 0;
+}
+
 /**
  * kobject_uevent_env - send an uevent with environmental data
  *
@@ -244,8 +260,10 @@ int kobject_uevent_env(struct kobject *k
 			}
 
 			NETLINK_CB(skb).dst_group = 1;
-			retval = netlink_broadcast(uevent_sock, skb, 0, 1,
-						   GFP_KERNEL);
+			retval = netlink_broadcast_filtered(uevent_sock, skb,
+							    0, 1, GFP_KERNEL,
+							    kobj_bcast_filter,
+							    kobj);
 			/* ENOBUFS should be handled in userspace */
 			if (retval == -ENOBUFS)
 				retval = 0;


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch kobject-send-hotplug-events-in-all-network-namespaces.patch added to gregkh-2.6 tree
  2010-05-05  0:36 ` [PATCH 1/6] kobject: Send hotplug events in all network namespaces Eric W. Biederman
@ 2010-05-20 18:10   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-05-20 18:10 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, davem, eric.dumazet, gregkh,
	kay.sievers, netdev, serue


This is a note to let you know that I've just added the patch titled

    Subject: kobject: Send hotplug events in all network namespaces

to my gregkh-2.6 tree.  Its filename is

    kobject-send-hotplug-events-in-all-network-namespaces.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu May 20 10:40:26 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue,  4 May 2010 17:36:44 -0700
Subject: kobject: Send hotplug events in all network namespaces
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>, "Eric W. Biederman" <ebiederm@xmission.com>
Message-ID: <1273019809-16472-1-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

Open a copy of the uevent kernel socket in each network
namespace so we can send uevents in all network namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 lib/kobject_uevent.c |   68 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 60 insertions(+), 8 deletions(-)

--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -24,13 +24,19 @@
 #include <linux/skbuff.h>
 #include <linux/netlink.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 
 
 u64 uevent_seqnum;
 char uevent_helper[UEVENT_HELPER_PATH_LEN] = CONFIG_UEVENT_HELPER_PATH;
 static DEFINE_SPINLOCK(sequence_lock);
-#if defined(CONFIG_NET)
-static struct sock *uevent_sock;
+#ifdef CONFIG_NET
+struct uevent_sock {
+	struct list_head list;
+	struct sock *sk;
+};
+static LIST_HEAD(uevent_sock_list);
+static DEFINE_MUTEX(uevent_sock_mutex);
 #endif
 
 /* the strings here must match the enum in include/linux/kobject.h */
@@ -100,6 +106,9 @@ int kobject_uevent_env(struct kobject *k
 	u64 seq;
 	int i = 0;
 	int retval = 0;
+#ifdef CONFIG_NET
+	struct uevent_sock *ue_sk;
+#endif
 
 	pr_debug("kobject: '%s' (%p): %s\n",
 		 kobject_name(kobj), kobj, __func__);
@@ -211,7 +220,9 @@ int kobject_uevent_env(struct kobject *k
 
 #if defined(CONFIG_NET)
 	/* send netlink message */
-	if (uevent_sock) {
+	mutex_lock(&uevent_sock_mutex);
+	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
+		struct sock *uevent_sock = ue_sk->sk;
 		struct sk_buff *skb;
 		size_t len;
 
@@ -241,6 +252,7 @@ int kobject_uevent_env(struct kobject *k
 		} else
 			retval = -ENOMEM;
 	}
+	mutex_unlock(&uevent_sock_mutex);
 #endif
 
 	/* call uevent_helper, usually only enabled during early boot */
@@ -320,18 +332,58 @@ int add_uevent_var(struct kobj_uevent_en
 EXPORT_SYMBOL_GPL(add_uevent_var);
 
 #if defined(CONFIG_NET)
-static int __init kobject_uevent_init(void)
+static int uevent_net_init(struct net *net)
 {
-	uevent_sock = netlink_kernel_create(&init_net, NETLINK_KOBJECT_UEVENT,
-					    1, NULL, NULL, THIS_MODULE);
-	if (!uevent_sock) {
+	struct uevent_sock *ue_sk;
+
+	ue_sk = kzalloc(sizeof(*ue_sk), GFP_KERNEL);
+	if (!ue_sk)
+		return -ENOMEM;
+
+	ue_sk->sk = netlink_kernel_create(net, NETLINK_KOBJECT_UEVENT,
+					  1, NULL, NULL, THIS_MODULE);
+	if (!ue_sk->sk) {
 		printk(KERN_ERR
 		       "kobject_uevent: unable to create netlink socket!\n");
 		return -ENODEV;
 	}
-	netlink_set_nonroot(NETLINK_KOBJECT_UEVENT, NL_NONROOT_RECV);
+	mutex_lock(&uevent_sock_mutex);
+	list_add_tail(&ue_sk->list, &uevent_sock_list);
+	mutex_unlock(&uevent_sock_mutex);
 	return 0;
 }
 
+static void uevent_net_exit(struct net *net)
+{
+	struct uevent_sock *ue_sk;
+
+	mutex_lock(&uevent_sock_mutex);
+	list_for_each_entry(ue_sk, &uevent_sock_list, list) {
+		if (sock_net(ue_sk->sk) == net)
+			goto found;
+	}
+	mutex_unlock(&uevent_sock_mutex);
+	return;
+
+found:
+	list_del(&ue_sk->list);
+	mutex_unlock(&uevent_sock_mutex);
+
+	netlink_kernel_release(ue_sk->sk);
+	kfree(ue_sk);
+}
+
+static struct pernet_operations uevent_net_ops = {
+	.init	= uevent_net_init,
+	.exit	= uevent_net_exit,
+};
+
+static int __init kobject_uevent_init(void)
+{
+	netlink_set_nonroot(NETLINK_KOBJECT_UEVENT, NL_NONROOT_RECV);
+	return register_pernet_subsys(&uevent_net_ops);
+}
+
+
 postcore_initcall(kobject_uevent_init);
 #endif


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch hotplug-netns-aware-uevent_helper.patch added to gregkh-2.6 tree
  2010-05-05  0:36 ` [PATCH 5/6] hotplug: netns aware uevent_helper Eric W. Biederman
@ 2010-05-20 18:10   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-05-20 18:10 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, davem, eric.dumazet, gregkh,
	kay.sievers, netdev, serue


This is a note to let you know that I've just added the patch titled

    Subject: hotplug: netns aware uevent_helper

to my gregkh-2.6 tree.  Its filename is

    hotplug-netns-aware-uevent_helper.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu May 20 10:45:13 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue,  4 May 2010 17:36:48 -0700
Subject: hotplug: netns aware uevent_helper
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>, "Eric W. Biederman" <ebiederm@xmission.com>
Message-ID: <1273019809-16472-5-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

It only makes sense for uevent_helper to get events
in the intial namespaces.  It's invocation is not
per namespace and it is not clear how we could make
it's invocation namespace aware.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 lib/kobject_uevent.c |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -19,7 +19,7 @@
 #include <linux/kobject.h>
 #include <linux/module.h>
 #include <linux/slab.h>
-
+#include <linux/user_namespace.h>
 #include <linux/socket.h>
 #include <linux/skbuff.h>
 #include <linux/netlink.h>
@@ -99,6 +99,21 @@ static int kobj_bcast_filter(struct sock
 	return 0;
 }
 
+static int kobj_usermode_filter(struct kobject *kobj)
+{
+	const struct kobj_ns_type_operations *ops;
+
+	ops = kobj_ns_ops(kobj);
+	if (ops) {
+		const void *init_ns, *ns;
+		ns = kobj->ktype->namespace(kobj);
+		init_ns = ops->initial_ns();
+		return ns != init_ns;
+	}
+
+	return 0;
+}
+
 /**
  * kobject_uevent_env - send an uevent with environmental data
  *
@@ -274,7 +289,7 @@ int kobject_uevent_env(struct kobject *k
 #endif
 
 	/* call uevent_helper, usually only enabled during early boot */
-	if (uevent_helper[0]) {
+	if (uevent_helper[0] && !kobj_usermode_filter(kobj)) {
 		char *argv [3];
 
 		argv [0] = uevent_helper;


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch net-expose-all-network-devices-in-a-namespaces-in-sysfs.patch added to gregkh-2.6 tree
  2010-05-05  0:36 ` [PATCH 6/6] net: Expose all network devices in a namespaces in sysfs Eric W. Biederman
@ 2010-05-20 18:10   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-05-20 18:10 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, davem, eric.dumazet, gregkh,
	kay.sievers, netdev, serue


This is a note to let you know that I've just added the patch titled

    Subject: net: Expose all network devices in a namespaces in sysfs

to my gregkh-2.6 tree.  Its filename is

    net-expose-all-network-devices-in-a-namespaces-in-sysfs.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu May 20 10:46:13 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue,  4 May 2010 17:36:49 -0700
Subject: net: Expose all network devices in a namespaces in sysfs
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>, "Eric W. Biederman" <ebiederm@xmission.com>
Message-ID: <1273019809-16472-6-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

This reverts commit aaf8cdc34ddba08122f02217d9d684e2f9f5d575.

Drivers like the ipw2100 call device_create_group when they
are initialized and device_remove_group when they are shutdown.
Moving them between namespaces deletes their sysfs groups early.

In particular the following call chain results.
netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
With sysfs_remove_dir recursively deleting all of it's subdirectories,
and nothing adding them back.

Ouch!

Therefore we need to call something that ultimate calls sysfs_mv_dir
as that sysfs function can move sysfs directories between namespaces
without deleting their subdirectories or their contents.   Allowing
us to avoid placing extra boiler plate into every driver that does
something interesting with sysfs.

Currently the function that provides that capability is device_rename.
That is the code works without nasty side effects as originally written.

So remove the misguided fix for moving devices between namespaces.  The
bug in the kobject layer that inspired it has now been recognized and
fixed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 net/core/dev.c       |   28 +++++-----------------------
 net/core/net-sysfs.c |   16 +---------------
 net/core/net-sysfs.h |    1 -
 3 files changed, 6 insertions(+), 39 deletions(-)

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -984,15 +984,10 @@ int dev_change_name(struct net_device *d
 		return err;
 
 rollback:
-	/* For now only devices in the initial network namespace
-	 * are in sysfs.
-	 */
-	if (net_eq(net, &init_net)) {
-		ret = device_rename(&dev->dev, dev->name);
-		if (ret) {
-			memcpy(dev->name, oldname, IFNAMSIZ);
-			return ret;
-		}
+	ret = device_rename(&dev->dev, dev->name);
+	if (ret) {
+		memcpy(dev->name, oldname, IFNAMSIZ);
+		return ret;
 	}
 
 	write_lock_bh(&dev_base_lock);
@@ -5112,8 +5107,6 @@ int register_netdevice(struct net_device
 	if (dev->features & NETIF_F_SG)
 		dev->features |= NETIF_F_GSO;
 
-	netdev_initialize_kobject(dev);
-
 	ret = call_netdevice_notifiers(NETDEV_POST_INIT, dev);
 	ret = notifier_to_errno(ret);
 	if (ret)
@@ -5634,15 +5627,6 @@ int dev_change_net_namespace(struct net_
 	if (dev->features & NETIF_F_NETNS_LOCAL)
 		goto out;
 
-#ifdef CONFIG_SYSFS
-	/* Don't allow real devices to be moved when sysfs
-	 * is enabled.
-	 */
-	err = -EINVAL;
-	if (dev->dev.parent)
-		goto out;
-#endif
-
 	/* Ensure the device has been registrered */
 	err = -EINVAL;
 	if (dev->reg_state != NETREG_REGISTERED)
@@ -5693,8 +5677,6 @@ int dev_change_net_namespace(struct net_
 	dev_unicast_flush(dev);
 	dev_addr_discard(dev);
 
-	netdev_unregister_kobject(dev);
-
 	/* Actually switch the network namespace */
 	dev_net_set(dev, net);
 
@@ -5707,7 +5689,7 @@ int dev_change_net_namespace(struct net_
 	}
 
 	/* Fixup kobjects */
-	err = netdev_register_kobject(dev);
+	err = device_rename(&dev->dev, dev->name);
 	WARN_ON(err);
 
 	/* Add the device back in the hashes */
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -508,9 +508,6 @@ static int netdev_uevent(struct device *
 	struct net_device *dev = to_net_dev(d);
 	int retval;
 
-	if (!net_eq(dev_net(dev), &init_net))
-		return 0;
-
 	/* pass interface to uevent. */
 	retval = add_uevent_var(env, "INTERFACE=%s", dev->name);
 	if (retval)
@@ -569,9 +566,6 @@ void netdev_unregister_kobject(struct ne
 
 	kobject_get(&dev->kobj);
 
-	if (!net_eq(dev_net(net), &init_net))
-		return;
-
 	device_del(dev);
 }
 
@@ -581,6 +575,7 @@ int netdev_register_kobject(struct net_d
 	struct device *dev = &(net->dev);
 	const struct attribute_group **groups = net->sysfs_groups;
 
+	device_initialize(dev);
 	dev->class = &net_class;
 	dev->platform_data = net;
 	dev->groups = groups;
@@ -603,9 +598,6 @@ int netdev_register_kobject(struct net_d
 #endif
 #endif /* CONFIG_SYSFS */
 
-	if (!net_eq(dev_net(net), &init_net))
-		return 0;
-
 	return device_add(dev);
 }
 
@@ -622,12 +614,6 @@ void netdev_class_remove_file(struct cla
 EXPORT_SYMBOL(netdev_class_create_file);
 EXPORT_SYMBOL(netdev_class_remove_file);
 
-void netdev_initialize_kobject(struct net_device *net)
-{
-	struct device *device = &(net->dev);
-	device_initialize(device);
-}
-
 int netdev_kobject_init(void)
 {
 	kobj_ns_type_register(&net_ns_type_operations);
--- a/net/core/net-sysfs.h
+++ b/net/core/net-sysfs.h
@@ -4,5 +4,4 @@
 int netdev_kobject_init(void);
 int netdev_register_kobject(struct net_device *);
 void netdev_unregister_kobject(struct net_device *);
-void netdev_initialize_kobject(struct net_device *);
 #endif


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch netns-teach-network-device-kobjects-which-namespace-they-are-in.patch added to gregkh-2.6 tree
  2010-05-05  0:36 ` [PATCH 2/6] netns: Teach network device kobjects which namespace they are in Eric W. Biederman
  2010-05-05 15:17   ` Serge E. Hallyn
@ 2010-05-20 18:10   ` gregkh
  1 sibling, 0 replies; 83+ messages in thread
From: gregkh @ 2010-05-20 18:10 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, davem, eric.dumazet, gregkh,
	kay.sievers, netdev, serue


This is a note to let you know that I've just added the patch titled

    Subject: [PATCH 2/6] netns: Teach network device kobjects which namespace they are in.

to my gregkh-2.6 tree.  Its filename is

    netns-teach-network-device-kobjects-which-namespace-they-are-in.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu May 20 10:41:04 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue,  4 May 2010 17:36:45 -0700
Subject: [PATCH 2/6] netns: Teach network device kobjects which namespace they are in.
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>, "Eric W. Biederman" <ebiederm@xmission.com>
Message-ID: <1273019809-16472-2-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

The problem.  Network devices show up in sysfs and with the network
namespace active multiple devices with the same name can show up in
the same directory, ouch!

To avoid that problem and allow existing applications in network namespaces
to see the same interface that is currently presented in sysfs, this
patch enables the tagging directory support in sysfs.

By using the network namespace pointers as tags to separate out the
the sysfs directory entries we ensure that we don't have conflicts
in the directories and applications only see a limited set of
the network devices.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/kobject.h |    1 +
 net/Kconfig             |    8 ++++++++
 net/core/net-sysfs.c    |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -142,6 +142,7 @@ extern const struct sysfs_ops kobj_sysfs
  */
 enum kobj_ns_type {
 	KOBJ_NS_TYPE_NONE = 0,
+	KOBJ_NS_TYPE_NET,
 	KOBJ_NS_TYPES
 };
 
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -45,6 +45,14 @@ config COMPAT_NETLINK_MESSAGES
 
 menu "Networking options"
 
+config NET_NS
+	bool "Network namespace support"
+	default n
+	depends on EXPERIMENTAL && NAMESPACES
+	help
+	  Allow user space to create what appear to be multiple instances
+	  of the network stack.
+
 source "net/packet/Kconfig"
 source "net/unix/Kconfig"
 source "net/xfrm/Kconfig"
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -14,7 +14,9 @@
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
 #include <linux/slab.h>
+#include <linux/nsproxy.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <net/wext.h>
@@ -467,6 +469,37 @@ static struct attribute_group wireless_g
 };
 #endif
 
+static const void *net_current_ns(void)
+{
+	return current->nsproxy->net_ns;
+}
+
+static const void *net_initial_ns(void)
+{
+	return &init_net;
+}
+
+static const void *net_netlink_ns(struct sock *sk)
+{
+	return sock_net(sk);
+}
+
+static struct kobj_ns_type_operations net_ns_type_operations = {
+	.type = KOBJ_NS_TYPE_NET,
+	.current_ns = net_current_ns,
+	.netlink_ns = net_netlink_ns,
+	.initial_ns = net_initial_ns,
+};
+
+static void net_kobj_ns_exit(struct net *net)
+{
+	kobj_ns_exit(KOBJ_NS_TYPE_NET, net);
+}
+
+static struct pernet_operations sysfs_net_ops = {
+	.exit = net_kobj_ns_exit,
+};
+
 #endif /* CONFIG_SYSFS */
 
 #ifdef CONFIG_HOTPLUG
@@ -507,6 +540,13 @@ static void netdev_release(struct device
 	kfree((char *)dev - dev->padded);
 }
 
+static const void *net_namespace(struct device *d)
+{
+	struct net_device *dev;
+	dev = container_of(d, struct net_device, dev);
+	return dev_net(dev);
+}
+
 static struct class net_class = {
 	.name = "net",
 	.dev_release = netdev_release,
@@ -516,6 +556,8 @@ static struct class net_class = {
 #ifdef CONFIG_HOTPLUG
 	.dev_uevent = netdev_uevent,
 #endif
+	.ns_type = &net_ns_type_operations,
+	.namespace = net_namespace,
 };
 
 /* Delete sysfs entries but hold kobject reference until after all
@@ -588,5 +630,9 @@ void netdev_initialize_kobject(struct ne
 
 int netdev_kobject_init(void)
 {
+	kobj_ns_type_register(&net_ns_type_operations);
+#ifdef CONFIG_SYSFS
+	register_pernet_subsys(&sysfs_net_ops);
+#endif
 	return class_register(&net_class);
 }


^ permalink raw reply	[flat|nested] 83+ messages in thread

* patch netlink-implment-netlink_broadcast_filtered.patch added to gregkh-2.6 tree
  2010-05-05  0:36 ` [PATCH 3/6] netlink: Implment netlink_broadcast_filtered Eric W. Biederman
@ 2010-05-20 18:10   ` gregkh
  0 siblings, 0 replies; 83+ messages in thread
From: gregkh @ 2010-05-20 18:10 UTC (permalink / raw)
  To: ebiederm, bcrl, cornelia.huck, davem, eric.dumazet, gregkh,
	kay.sievers, netdev, serue


This is a note to let you know that I've just added the patch titled

    Subject: netlink: Implment netlink_broadcast_filtered

to my gregkh-2.6 tree.  Its filename is

    netlink-implment-netlink_broadcast_filtered.patch

This tree can be found at 
    http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/patches/


>From ebiederm@xmission.com  Thu May 20 10:43:10 2010
From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Tue,  4 May 2010 17:36:46 -0700
Subject: netlink: Implment netlink_broadcast_filtered
To: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>, linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>, Cornelia Huck <cornelia.huck@de.ibm.com>, Eric Dumazet <eric.dumazet@gmail.com>, Benjamin LaHaise <bcrl@lhnet.ca>, Serge Hallyn <serue@us.ibm.com>, <netdev@vger.kernel.org>, David Miller <davem@davemloft.net>, "Eric W. Biederman" <ebiederm@xmission.com>
Message-ID: <1273019809-16472-3-git-send-email-ebiederm@xmission.com>


From: Eric W. Biederman <ebiederm@xmission.com>

When netlink sockets are used to convey data that is in a namespace
we need a way to select a subset of the listening sockets to deliver
the packet to.  For the network namespace we have been doing this
by only transmitting packets in the correct network namespace.

For data belonging to other namespaces netlink_bradcast_filtered
provides a mechanism that allows us to examine the destination
socket and to decide if we should transmit the specified packet
to it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

---
 include/linux/netlink.h  |    4 ++++
 net/netlink/af_netlink.c |   21 +++++++++++++++++++--
 2 files changed, 23 insertions(+), 2 deletions(-)

--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -188,6 +188,10 @@ extern int netlink_has_listeners(struct
 extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 pid, int nonblock);
 extern int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 pid,
 			     __u32 group, gfp_t allocation);
+extern int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb,
+	__u32 pid, __u32 group, gfp_t allocation,
+	int (*filter)(struct sock *dsk, struct sk_buff *skb, void *data),
+	void *filter_data);
 extern int netlink_set_err(struct sock *ssk, __u32 pid, __u32 group, int code);
 extern int netlink_register_notifier(struct notifier_block *nb);
 extern int netlink_unregister_notifier(struct notifier_block *nb);
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -978,6 +978,8 @@ struct netlink_broadcast_data {
 	int delivered;
 	gfp_t allocation;
 	struct sk_buff *skb, *skb2;
+	int (*tx_filter)(struct sock *dsk, struct sk_buff *skb, void *data);
+	void *tx_data;
 };
 
 static inline int do_one_broadcast(struct sock *sk,
@@ -1020,6 +1022,9 @@ static inline int do_one_broadcast(struc
 		p->failure = 1;
 		if (nlk->flags & NETLINK_BROADCAST_SEND_ERROR)
 			p->delivery_failure = 1;
+	} else if (p->tx_filter && p->tx_filter(sk, p->skb2, p->tx_data)) {
+		kfree_skb(p->skb2);
+		p->skb2 = NULL;
 	} else if (sk_filter(sk, p->skb2)) {
 		kfree_skb(p->skb2);
 		p->skb2 = NULL;
@@ -1038,8 +1043,10 @@ out:
 	return 0;
 }
 
-int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
-		      u32 group, gfp_t allocation)
+int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb, u32 pid,
+	u32 group, gfp_t allocation,
+	int (*filter)(struct sock *dsk, struct sk_buff *skb, void *data),
+	void *filter_data)
 {
 	struct net *net = sock_net(ssk);
 	struct netlink_broadcast_data info;
@@ -1059,6 +1066,8 @@ int netlink_broadcast(struct sock *ssk,
 	info.allocation = allocation;
 	info.skb = skb;
 	info.skb2 = NULL;
+	info.tx_filter = filter;
+	info.tx_data = filter_data;
 
 	/* While we sleep in clone, do not allow to change socket list */
 
@@ -1083,6 +1092,14 @@ int netlink_broadcast(struct sock *ssk,
 	}
 	return -ESRCH;
 }
+EXPORT_SYMBOL(netlink_broadcast_filtered);
+
+int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, u32 pid,
+		      u32 group, gfp_t allocation)
+{
+	return netlink_broadcast_filtered(ssk, skb, pid, group, allocation,
+		NULL, NULL);
+}
 EXPORT_SYMBOL(netlink_broadcast);
 
 struct netlink_set_err_data {


^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2010-05-20 18:10 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
2010-03-30 19:23   ` Eric Dumazet
2010-03-30 23:50     ` [PATCH 7/6] sysfs: Remove double free sysfs_get_sb Eric W. Biederman
2010-03-31  5:01   ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Serge E. Hallyn
2010-03-31  5:01     ` Serge E. Hallyn
2010-03-31  5:41   ` Tejun Heo
2010-03-31  5:51     ` Eric W. Biederman
2010-03-31 13:47       ` Serge E. Hallyn
2010-03-31 14:02         ` Eric W. Biederman
2010-04-05  7:45       ` Tejun Heo
2010-04-29 20:29   ` patch sysfs-basic-support-for-multiple-super-blocks.patch added to gregkh-2.6 tree gregkh
2010-03-30 18:31 ` [PATCH 2/6] kobj: Add basic infrastructure for dealing with namespaces Eric W. Biederman
2010-04-29 20:29   ` patch kobj-add-basic-infrastructure-for-dealing-with-namespaces.patch added to gregkh-2.6 tree gregkh
2010-03-30 18:31 ` [PATCH 3/6] sysfs: Implement sysfs tagged directory support Eric W. Biederman
2010-03-31  2:43   ` Serge E. Hallyn
2010-03-31  3:38     ` Eric W. Biederman
2010-03-31  4:02       ` Serge E. Hallyn
2010-03-31  4:23         ` Eric W. Biederman
2010-03-31  4:53           ` Serge E. Hallyn
2010-03-31  6:49   ` Tejun Heo
2010-03-31  7:43     ` Eric W. Biederman
2010-03-31  8:17       ` Tejun Heo
2010-03-31  8:22         ` Tejun Heo
2010-03-31  9:39           ` Eric W. Biederman
2010-04-05  8:17             ` Tejun Heo
2010-04-29 20:29   ` patch sysfs-implement-sysfs-tagged-directory-support.patch added to gregkh-2.6 tree gregkh
2010-04-30  4:18     ` Tejun Heo
2010-04-30  4:45       ` Greg KH
2010-04-30  5:24         ` Eric W. Biederman
2010-04-30  5:37           ` Tejun Heo
2010-04-30  6:12             ` Tejun Heo
2010-04-30 14:29             ` Serge E. Hallyn
2010-04-30 15:22               ` Tejun Heo
2010-04-30 15:43                 ` Serge E. Hallyn
2010-04-30 15:58                   ` Greg KH
2010-03-30 18:31 ` [PATCH 4/6] sysfs: Add support for tagged directories with untagged members Eric W. Biederman
2010-04-29 20:29   ` patch sysfs-add-support-for-tagged-directories-with-untagged-members.patch added to gregkh-2.6 tree gregkh
2010-03-30 18:31 ` [PATCH 5/6] sysfs: Implement sysfs_delete_link Eric W. Biederman
2010-04-29 20:29   ` patch sysfs-implement-sysfs_delete_link.patch added to gregkh-2.6 tree gregkh
2010-03-30 18:31 ` [PATCH 6/6] driver core: Implement ns directory support for device classes Eric W. Biederman
2010-04-29 20:29   ` patch driver-core-implement-ns-directory-support-for-device-classes.patch added to gregkh-2.6 tree gregkh
2010-03-30 18:53 ` [PATCH 0/6] tagged sysfs support Kay Sievers
2010-03-30 23:04   ` Eric W. Biederman
2010-03-31  5:51     ` Kay Sievers
2010-03-31  6:25       ` Tejun Heo
2010-03-31  6:52       ` Eric W. Biederman
2010-04-03  0:58       ` Ben Hutchings
2010-04-03  8:35         ` Kay Sievers
2010-04-03 16:05           ` Ben Hutchings
2010-04-03 16:35             ` Kay Sievers
2010-04-03 16:35               ` Kay Sievers
2010-03-31 17:21 ` Serge E. Hallyn
2010-03-31 18:09   ` Eric W. Biederman
2010-05-05  0:35 ` [PATCH 0/6] netns support in the kobject layer Eric W. Biederman
2010-05-06 20:04   ` Greg KH
2010-05-16  6:26     ` David Miller
2010-05-17 18:11       ` Greg KH
2010-05-17 20:58         ` Eric W. Biederman
2010-05-17 21:03           ` Greg KH
2010-05-17 22:37             ` Eric W. Biederman
2010-05-17 22:54               ` Greg KH
2010-05-17 23:48             ` David Miller
2010-05-18  4:08               ` Greg KH
2010-05-18  4:21                 ` David Miller
2010-05-05  0:36 ` [PATCH 1/6] kobject: Send hotplug events in all network namespaces Eric W. Biederman
2010-05-20 18:10   ` patch kobject-send-hotplug-events-in-all-network-namespaces.patch added to gregkh-2.6 tree gregkh
2010-05-05  0:36 ` [PATCH 2/6] netns: Teach network device kobjects which namespace they are in Eric W. Biederman
2010-05-05 15:17   ` Serge E. Hallyn
2010-05-05 19:56     ` Eric W. Biederman
2010-05-05 22:01       ` Serge E. Hallyn
2010-05-17  4:59         ` [PATCH 7/6] net/sysfs: Fix the bitrot in network device kobject namespace support Eric W. Biederman
2010-05-17  5:07           ` David Miller
2010-05-20 18:10   ` patch netns-teach-network-device-kobjects-which-namespace-they-are-in.patch added to gregkh-2.6 tree gregkh
2010-05-05  0:36 ` [PATCH 3/6] netlink: Implment netlink_broadcast_filtered Eric W. Biederman
2010-05-20 18:10   ` patch netlink-implment-netlink_broadcast_filtered.patch added to gregkh-2.6 tree gregkh
2010-05-05  0:36 ` [PATCH 4/6] kobj: Send hotplug events in the proper namespace Eric W. Biederman
2010-05-20 18:10   ` patch kobj-send-hotplug-events-in-the-proper-namespace.patch added to gregkh-2.6 tree gregkh
2010-05-05  0:36 ` [PATCH 5/6] hotplug: netns aware uevent_helper Eric W. Biederman
2010-05-20 18:10   ` patch hotplug-netns-aware-uevent_helper.patch added to gregkh-2.6 tree gregkh
2010-05-05  0:36 ` [PATCH 6/6] net: Expose all network devices in a namespaces in sysfs Eric W. Biederman
2010-05-20 18:10   ` patch net-expose-all-network-devices-in-a-namespaces-in-sysfs.patch added to gregkh-2.6 tree gregkh
2010-05-20 17:47 ` [PATCH 0/6] tagged sysfs support Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.