linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
@ 2008-10-22 15:21 Benjamin Thery
  2008-10-22 15:21 ` [PATCH 1/4] netns: add in ida ID to identify the network namespace Benjamin Thery
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Benjamin Thery @ 2008-10-22 15:21 UTC (permalink / raw)
  To: netdev, Dave Miller
  Cc: Eric Biederman, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers, Benjamin Thery


Support for network namespaces in mainline is pretty complete for
some time now, but there is still this issue with sysfs that prevents 
more people to use it easily.

Reminder for those not aware of the netns/sysfs issue:

With network namespaces, the kernel must be able to support net devices
with the same name in different network namespaces: the most obvious 
example being the loopback device, which exists in every namespace. 
The remaining place where this doesn't work yet is sysfs.

In the last 12 months, Eric Biederman proposed different approaches
to support this and sent several patchsets to implement what he calls
"sysfs tagged directories". But unfortunately, there is still no 
agreement on the patchset and its implementation.

See last round of comments there: 
http://thread.gmane.org/gmane.linux.kernel/735612/focus=740050

So, currently testing network namespaces on a mainline kernel is a
pain and involves either to disable sysfs completely (argh) or to find
and manually apply Eric's latest patchset (was in gregkh's tree for a 
short time, but unfortunately it was dumped out a few a weeks ago).


This patchset explores an alternative suggested by Serge Hallyn
to  *temporarily*  fix this issue. It introduces the modifications 
needed to register in sysfs, the network devices belonging to child
network namespaces with a suffix appended to their name to avoid 
potential conflicts.

http://thread.gmane.org/gmane.linux.kernel/735612/focus=741757

Network devices from the initial network namespace are untouched.
Their representation in sysfs (/sys/class/net/, ...) is unchanged.

Network devices from sub-network namespaces appear in sysfs
with a name that looks like this: device_name@netns_id
eg: lo@3, eth0@4e

See last patch of the series for the details.

Then, if needed in the child network namespace, we can filter 
/sys/class/net contents with, for example:

* mount -t tmpfs /sys/class/net 
* and  manually link the right devices from /sys/devices/virtual/net
  (ln -s ../../devices/virtual/net/lo@1 lo)

This is less elegant than Eric's approach, but is quite simple and 
doesn't touch sysfs core code.

This patch applies on top of net-next-2.6.

Benjamin

-- 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] netns: add in ida ID to identify the network namespace
  2008-10-22 15:21 [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries Benjamin Thery
@ 2008-10-22 15:21 ` Benjamin Thery
  2008-10-22 15:22 ` [PATCH 2/4] netns: Export nets id to /proc/net/netns Benjamin Thery
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Benjamin Thery @ 2008-10-22 15:21 UTC (permalink / raw)
  To: netdev, Dave Miller
  Cc: Eric Biederman, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers, Benjamin Thery

This patch adds in 'id' member to struct net. This member contains an
IDA ID. The id is allocated at netns creation. 

This id will be used to 'tag' net devices belonging to network namespace
in sysfs (in order to allow registration of net devices with the same name 
in different network namespace: see last patch for the details).

The advantage of IDA IDs over using the netns pointer is the id values
are small starting at 0, thus they requires less bytes to be displayed
(we are limited to 3 characters: again see last patch for the details). 
And they stay small as they are recycled when IDs are freed (unless you
have thousands of netns running on your host).

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 include/net/net_namespace.h |    2 ++
 net/core/net_namespace.c    |   24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+)

Index: net-next-2.6/include/net/net_namespace.h
===================================================================
--- net-next-2.6.orig/include/net/net_namespace.h
+++ net-next-2.6/include/net/net_namespace.h
@@ -75,6 +75,8 @@ struct net {
 #endif
 #endif
 	struct net_generic	*gen;
+
+	int			id;
 };
 
 
Index: net-next-2.6/net/core/net_namespace.c
===================================================================
--- net-next-2.6.orig/net/core/net_namespace.c
+++ net-next-2.6/net/core/net_namespace.c
@@ -26,6 +26,16 @@ EXPORT_SYMBOL(init_net);
 #define INITIAL_NET_GEN_PTRS	13 /* +1 for len +2 for rcu_head */
 
 /*
+ * IDs allocated from this ida are used as a suffix in sysfs to tag
+ * devices belonging to each netns (other than init_net).
+ * Only 4 characters are left to store a separator plus this tag in sysfs
+ * (BUS_ID_SIZE-IFNAMSIZ)
+ * Thus, this limits us to 4095 ("FFF") simultaneous network namespaces.
+ */
+static DEFINE_IDA(net_id_ida);
+#define NET_ID_MAX 0xFFF
+
+/*
  * setup_net runs the initializers for the network namespace object.
  */
 static __net_init int setup_net(struct net *net)
@@ -40,7 +50,20 @@ static __net_init int setup_net(struct n
 	atomic_set(&net->use_count, 0);
 #endif
 
+	/* Get an ID */
+again:
+	error = ida_get_new(&net_id_ida, &net->id);
+	if (error) {
+		if (error == -EAGAIN) {
+			ida_pre_get(&net_id_ida, GFP_KERNEL);
+			goto again;
+		}
+		goto out;
+	}
 	error = -ENOMEM;
+	if (net->id > NET_ID_MAX)
+		goto out;
+
 	ng = kzalloc(sizeof(struct net_generic) +
 			INITIAL_NET_GEN_PTRS * sizeof(void *), GFP_KERNEL);
 	if (ng == NULL)
@@ -97,6 +120,7 @@ static void net_free(struct net *net)
 	}
 #endif
 
+	ida_remove(&net_id_ida, net->id);
 	kmem_cache_free(net_cachep, net);
 }
 

-- 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 2/4] netns: Export nets id to /proc/net/netns
  2008-10-22 15:21 [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries Benjamin Thery
  2008-10-22 15:21 ` [PATCH 1/4] netns: add in ida ID to identify the network namespace Benjamin Thery
@ 2008-10-22 15:22 ` Benjamin Thery
  2008-10-22 15:22 ` [PATCH 3/4] net: cleanup some vars names to be more consistant with the network code Benjamin Thery
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Benjamin Thery @ 2008-10-22 15:22 UTC (permalink / raw)
  To: netdev, Dave Miller
  Cc: Eric Biederman, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers, Benjamin Thery

This patches exports the new 'struct net' net->id value to /proc/net/nsid
file.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
---
 net/core/net_namespace.c |   50 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

Index: net-next-2.6/net/core/net_namespace.c
===================================================================
--- net-next-2.6.orig/net/core/net_namespace.c
+++ net-next-2.6/net/core/net_namespace.c
@@ -6,6 +6,7 @@
 #include <linux/delay.h>
 #include <linux/sched.h>
 #include <linux/idr.h>
+#include <linux/proc_fs.h>
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
@@ -211,6 +212,53 @@ struct net *copy_net_ns(unsigned long fl
 }
 #endif
 
+#ifdef CONFIG_PROC_FS
+static int netns_seq_show(struct seq_file *seq, void *v)
+{
+	struct net *net = seq->private;
+
+	seq_printf(seq, "%x\n", net->id);
+	return 0;
+}
+
+static int netns_seq_open(struct inode *inode, struct file *file)
+{
+        return single_open_net(inode, file, netns_seq_show);
+}
+
+static const struct file_operations netns_seq_fops = {
+	.owner   = THIS_MODULE,
+	.open    = netns_seq_open,
+	.read	 = seq_read,
+	.llseek	 = seq_lseek,
+	.release = single_release_net,
+};
+
+static int __net_init netns_net_init(struct net *net)
+{
+	if (!proc_net_fops_create(net, "nsid", S_IRUGO, &netns_seq_fops))
+		return -ENOMEM;
+	return 0;
+}
+
+static void __net_exit netns_net_exit(struct net *net)
+{
+	proc_net_remove(net, "nsid");
+}
+
+static struct pernet_operations netns_proc_ops = {
+	.init = netns_net_init,
+	.exit = netns_net_exit,
+};
+
+static int __init netns_proc_init()
+{
+	return register_pernet_subsys(&netns_proc_ops);
+}
+#else
+#define netns_proc_init() 0
+#endif /* CONFIG_PROC_FS */
+
 static int __init net_ns_init(void)
 {
 	int err;
@@ -226,6 +274,8 @@ static int __init net_ns_init(void)
 	if (!netns_wq)
 		panic("Could not create netns workq");
 #endif
+	if (netns_proc_init())
+		panic("Could not register netns subsys");
 
 	mutex_lock(&net_mutex);
 	err = setup_net(&init_net);

-- 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 3/4] net: cleanup some vars names to be more consistant with the network code
  2008-10-22 15:21 [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries Benjamin Thery
  2008-10-22 15:21 ` [PATCH 1/4] netns: add in ida ID to identify the network namespace Benjamin Thery
  2008-10-22 15:22 ` [PATCH 2/4] netns: Export nets id to /proc/net/netns Benjamin Thery
@ 2008-10-22 15:22 ` Benjamin Thery
  2008-10-22 15:22 ` [PATCH 4/4] netns: sysfs: add netns suffix to net devices sysfs entries Benjamin Thery
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Benjamin Thery @ 2008-10-22 15:22 UTC (permalink / raw)
  To: netdev, Dave Miller
  Cc: Eric Biederman, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers, Benjamin Thery

This small patch renames a couple of net_device variables from 
'net' to 'netdev' to avoid confusion with 'struct net' variables
that will be introduced in the next patch. 
'net' is commonly used to designate 'struct net' in the rest of 
the network code.

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 net/core/net-sysfs.c |   17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Index: net-next-2.6/net/core/net-sysfs.c
===================================================================
--- net-next-2.6.orig/net/core/net-sysfs.c
+++ net-next-2.6/net/core/net-sysfs.c
@@ -471,32 +471,33 @@ static struct class net_class = {
 /* Delete sysfs entries but hold kobject reference until after all
  * netdev references are gone.
  */
-void netdev_unregister_kobject(struct net_device * net)
+void netdev_unregister_kobject(struct net_device *netdev)
 {
-	struct device *dev = &(net->dev);
+	struct device *dev = &(netdev->dev);
 
 	kobject_get(&dev->kobj);
 	device_del(dev);
 }
 
 /* Create sysfs entries for network device. */
-int netdev_register_kobject(struct net_device *net)
+int netdev_register_kobject(struct net_device *netdev)
 {
-	struct device *dev = &(net->dev);
-	struct attribute_group **groups = net->sysfs_groups;
+	struct device *dev = &(netdev->dev);
+	struct attribute_group **groups = netdev->sysfs_groups;
 
 	dev->class = &net_class;
-	dev->platform_data = net;
+	dev->platform_data = netdev;
 	dev->groups = groups;
 
 	BUILD_BUG_ON(BUS_ID_SIZE < IFNAMSIZ);
-	strlcpy(dev->bus_id, net->name, BUS_ID_SIZE);
+	strlcpy(dev->bus_id, netdev->name, BUS_ID_SIZE);
 
 #ifdef CONFIG_SYSFS
 	*groups++ = &netstat_group;
 
 #ifdef CONFIG_WIRELESS_EXT_SYSFS
-	if (net->wireless_handlers && net->wireless_handlers->get_wireless_stats)
+	if (netdev->wireless_handlers &&
+	    netdev->wireless_handlers->get_wireless_stats)
 		*groups++ = &wireless_group;
 #endif
 #endif /* CONFIG_SYSFS */

-- 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 4/4] netns: sysfs: add netns suffix to net devices sysfs entries
  2008-10-22 15:21 [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries Benjamin Thery
                   ` (2 preceding siblings ...)
  2008-10-22 15:22 ` [PATCH 3/4] net: cleanup some vars names to be more consistant with the network code Benjamin Thery
@ 2008-10-22 15:22 ` Benjamin Thery
  2008-10-22 19:59 ` [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device " Eric W. Biederman
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Benjamin Thery @ 2008-10-22 15:22 UTC (permalink / raw)
  To: netdev, Dave Miller
  Cc: Eric Biederman, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers, Benjamin Thery

Reminder: what we want is being able to create network interfaces with
the same name in different network namespaces (eg. the loopback). The
remaining issues are in sysfs.

This patch dissociates network devices actual names (stored in struct 
net_device and seen by ifconfig/ip tools) and network device names 
stored in sysfs. 

When adding a network device in child net namespace (!init_net), when
registering it in sysfs, a suffix unique to the current netns is 
appended to the actual device name. Currently this suffix is the netns
ida ID in hexa form separated by a '@' char.

In sysfs, we see all the network devices of all netns.

# ll /sys/devices/virtual/net/
...
drwxr-xr-x 4 root root 0 2008-10-13 14:08 lo
drwxr-xr-x 4 root root 0 2008-10-13 16:31 lo@1
...
drwxr-xr-x 4 root root 0 2008-10-13 16:31 lo@e5
...

Then, in the child network namespace we can filter the contents of 
/sys/class/net with:

* mount -t tmpfs /sys/class/net 
* and  manually link the right devices from /sys/devices/virtual/net
  (ln -s ../../devices/virtual/net/lo@1 lo)

Thus, /sys/class/net appears to be fine for the applications running
in this namespace.

FUSE can also be used to alter the view of /sys/class/net in the 
namespace.

Issues:
-------

* The suffix

  We only have four characters left (BUS_ID_SIZE - IFNAMSIZ) to build the
  suffix: add a separator and encode the netns ID.
  By encoding the ID in hexa, it introduces a limit of 4095 (0xFFF)
  sub-network namespaces running at the same time.

* This approach reduces isolation between network namespaces

  Everyone can see all the devices in each namespaces by exploring 
  /sys/devices/../.. or /sys/class/net (if it's not re-mounted as tmpfs).

* Does not work very well with CONFIG_SYSFS_DEPRECATED=y 

  The filtering of /sys/class/net with CONFIG_SYSFS_DEPRECATED=y is more
  difficult to do because in this case /sys/class/net contains 
  the actual directories (not symlinks).


Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
---
 net/Kconfig          |    2 +-
 net/core/dev.c       |    4 +++-
 net/core/net-sysfs.c |   34 +++++++++++++++++++++++++++++++++-
 net/core/net-sysfs.h |    1 +
 4 files changed, 38 insertions(+), 3 deletions(-)

Index: net-next-2.6/net/Kconfig
===================================================================
--- net-next-2.6.orig/net/Kconfig
+++ net-next-2.6/net/Kconfig
@@ -27,7 +27,7 @@ menu "Networking options"
 config NET_NS
 	bool "Network namespace support"
 	default n
-	depends on EXPERIMENTAL && !SYSFS && NAMESPACES
+	depends on EXPERIMENTAL && NAMESPACES
 	help
 	  Allow user space to create what appear to be multiple instances
 	  of the network stack.
Index: net-next-2.6/net/core/dev.c
===================================================================
--- net-next-2.6.orig/net/core/dev.c
+++ net-next-2.6/net/core/dev.c
@@ -894,6 +894,7 @@ int dev_alloc_name(struct net_device *de
 int dev_change_name(struct net_device *dev, const char *newname)
 {
 	char oldname[IFNAMSIZ];
+	char devname[BUS_ID_SIZE];
 	int err = 0;
 	int ret;
 	struct net *net;
@@ -924,7 +925,8 @@ int dev_change_name(struct net_device *d
 		strlcpy(dev->name, newname, IFNAMSIZ);
 
 rollback:
-	err = device_rename(&dev->dev, dev->name);
+	netdev_fill_bus_id_name(devname, dev);
+	err = device_rename(&dev->dev, devname);
 	if (err) {
 		memcpy(dev->name, oldname, IFNAMSIZ);
 		return err;
Index: net-next-2.6/net/core/net-sysfs.c
===================================================================
--- net-next-2.6.orig/net/core/net-sysfs.c
+++ net-next-2.6/net/core/net-sysfs.c
@@ -468,6 +468,38 @@ static struct class net_class = {
 #endif
 };
 
+/* Fill device bus_id name from net device name
+ * When registering a device for a child network namespace,
+ * a suffix is added to the name stored in "struct device"
+ * bus_id.
+ *
+ * devname size must be at least BUS_ID_SIZE
+ */
+void netdev_fill_bus_id_name(char *devname, struct net_device *netdev)
+{
+#ifndef CONFIG_NET_NS
+	strlcpy(devname, netdev->name, BUS_ID_SIZE);
+#else
+	struct net *net = dev_net(netdev);
+
+	if (net_eq(net, &init_net))
+		strlcpy(devname, netdev->name, BUS_ID_SIZE);
+	else {
+		/*
+		 * To allow registration of net devices with the same name in
+		 * different namespaces, append the netns identifier to the
+		 * device name in sysfs using the 4 bytes left in bus_id
+		 * (BUS_ID_SIZE - IFNAMSIZ).
+		 *
+		 * devname is in the form: device_name@XXX
+		 * the netns identifier is an integer < 4095, thus encodable
+		 * in hexa in 3 characters ("FFF").
+		 */
+		snprintf(devname, BUS_ID_SIZE, "%s@%x", netdev->name, net->id);
+	}
+#endif
+}
+
 /* Delete sysfs entries but hold kobject reference until after all
  * netdev references are gone.
  */
@@ -490,7 +522,7 @@ int netdev_register_kobject(struct net_d
 	dev->groups = groups;
 
 	BUILD_BUG_ON(BUS_ID_SIZE < IFNAMSIZ);
-	strlcpy(dev->bus_id, netdev->name, BUS_ID_SIZE);
+	netdev_fill_bus_id_name(dev->bus_id, netdev);
 
 #ifdef CONFIG_SYSFS
 	*groups++ = &netstat_group;
Index: net-next-2.6/net/core/net-sysfs.h
===================================================================
--- net-next-2.6.orig/net/core/net-sysfs.h
+++ net-next-2.6/net/core/net-sysfs.h
@@ -5,4 +5,5 @@ int netdev_kobject_init(void);
 int netdev_register_kobject(struct net_device *);
 void netdev_unregister_kobject(struct net_device *);
 void netdev_initialize_kobject(struct net_device *);
+void netdev_fill_bus_id_name(char *, struct net_device *);
 #endif

-- 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 15:21 [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries Benjamin Thery
                   ` (3 preceding siblings ...)
  2008-10-22 15:22 ` [PATCH 4/4] netns: sysfs: add netns suffix to net devices sysfs entries Benjamin Thery
@ 2008-10-22 19:59 ` Eric W. Biederman
  2008-10-22 20:30   ` Serge E. Hallyn
  2008-10-23 11:56   ` Benjamin Thery
  2008-10-22 20:16 ` Greg KH
  2008-10-22 20:32 ` [PATCH] netns: Coexist with the sysfs limitations Eric W. Biederman
  6 siblings, 2 replies; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-22 19:59 UTC (permalink / raw)
  To: Benjamin Thery
  Cc: netdev, Dave Miller, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Benjamin Thery <benjamin.thery@bull.net> writes:

> Support for network namespaces in mainline is pretty complete for
> some time now, but there is still this issue with sysfs that prevents 
> more people to use it easily.

Ben your patchset is completely inappropriate.

Temporarily adding elements to the ABI that we intend to remove
is not a proper solution to this problem.

That user space visible ida you add is a namespace identifier that breaks
nested containers and migration.  It is very very very wrong.

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 15:21 [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries Benjamin Thery
                   ` (4 preceding siblings ...)
  2008-10-22 19:59 ` [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device " Eric W. Biederman
@ 2008-10-22 20:16 ` Greg KH
  2008-10-22 21:08   ` Eric W. Biederman
  2008-10-22 20:32 ` [PATCH] netns: Coexist with the sysfs limitations Eric W. Biederman
  6 siblings, 1 reply; 25+ messages in thread
From: Greg KH @ 2008-10-22 20:16 UTC (permalink / raw)
  To: Benjamin Thery
  Cc: netdev, Dave Miller, Eric Biederman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

On Wed, Oct 22, 2008 at 05:21:44PM +0200, Benjamin Thery wrote:
> Network devices from sub-network namespaces appear in sysfs
> with a name that looks like this: device_name@netns_id
> eg: lo@3, eth0@4e

How does the default udev rules as shipped by most distros handle the
renaming of the network device if the MAC address is duplicated like it
will be for these eth devices?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 19:59 ` [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device " Eric W. Biederman
@ 2008-10-22 20:30   ` Serge E. Hallyn
  2008-10-22 21:01     ` Eric W. Biederman
  2008-10-23 11:56   ` Benjamin Thery
  1 sibling, 1 reply; 25+ messages in thread
From: Serge E. Hallyn @ 2008-10-22 20:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Benjamin Thery, netdev, Dave Miller, Greg Kroah-Hartman, Al Viro,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Quoting Eric W. Biederman (ebiederm@xmission.com):
> Benjamin Thery <benjamin.thery@bull.net> writes:
> 
> > Support for network namespaces in mainline is pretty complete for
> > some time now, but there is still this issue with sysfs that prevents 
> > more people to use it easily.
> 
> Ben your patchset is completely inappropriate.
> 
> Temporarily adding elements to the ABI that we intend to remove
> is not a proper solution to this problem.
> 
> That user space visible ida you add is a namespace identifier that breaks
> nested containers and migration.  It is very very very wrong.

I disagree (not surprising :) completely.  The well-known userspace
tools (ifconfig, ip, etc) will not see the lo@1, they'll see lo.
Userspace in a container can either umount /sys completely, or do

	mount -t tmpfs none /sys/class/net
	mount --bind /sys/devices/virtual/net/lo@1 /sys/class/net/lo

if they really want to, in which case only their view
of /sys/devices/virtual/net would be different.

Eric, would you hate this less if it was under some

	CONFIG_SYSFS_NETNS_HACK

config variable?

-serge

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] netns: Coexist with the sysfs limitations
  2008-10-22 15:21 [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries Benjamin Thery
                   ` (5 preceding siblings ...)
  2008-10-22 20:16 ` Greg KH
@ 2008-10-22 20:32 ` Eric W. Biederman
  2008-10-22 20:40   ` Daniel Lezcano
  2008-10-22 21:21   ` Serge E. Hallyn
  6 siblings, 2 replies; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-22 20:32 UTC (permalink / raw)
  To: Benjamin Thery
  Cc: netdev, Dave Miller, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers


To make testing of the network namespace simpler allow
the network namespace code and the sysfs code to be
compiled and run at the same time.  To do this only
virtual devices are allowed in the additional network
namespaces and those virtual devices are not placed
in the kobject tree.

Since virtual devices don't actually do anything interesting
hardware wise that needs device management there should
be no loss in keeping them out of the kobject tree and
by implication sysfs.  The gain in ease of testing
and code coverage should be significant.

I.e. people running distributions that make it next to
impossible to boot without sysfs should at be able to
boot a test kernel now.

Plus no ABIs are harmed with this patch.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/Kconfig          |    2 +-
 net/core/dev.c       |   12 +++++++++++-
 net/core/net-sysfs.c |    7 +++++++
 3 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index d789d79..8c3d97c 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -27,7 +27,7 @@ menu "Networking options"
 config NET_NS
 	bool "Network namespace support"
 	default n
-	depends on EXPERIMENTAL && !SYSFS && NAMESPACES
+	depends on EXPERIMENTAL && NAMESPACES
 	help
 	  Allow user space to create what appear to be multiple instances
 	  of the network stack.
diff --git a/net/core/dev.c b/net/core/dev.c
index b8a4fd0..a7f0461 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4449,6 +4449,15 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	if (dev->features & NETIF_F_NETNS_LOCAL)
 		goto out;
 
+#ifdef CONFIG_SYSFS
+	/* Don't allow real devices to be moved when sysfs
+	 * is enabled.
+	 */
+	err = -EINVAL;
+	if (dev->dev.parent)
+		goto out;
+#endif
+
 	/* Ensure the device has been registrered */
 	err = -EINVAL;
 	if (dev->reg_state != NETREG_REGISTERED)
@@ -4506,6 +4515,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	 */
 	dev_addr_discard(dev);
 
+	netdev_unregister_kobject(dev);
+
 	/* Actually switch the network namespace */
 	dev_net_set(dev, net);
 
@@ -4522,7 +4533,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	}
 
 	/* Fixup kobjects */
-	netdev_unregister_kobject(dev);
 	err = netdev_register_kobject(dev);
 	WARN_ON(err);
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 92d6b94..85cb8bd 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -476,6 +476,10 @@ void netdev_unregister_kobject(struct net_device * net)
 	struct device *dev = &(net->dev);
 
 	kobject_get(&dev->kobj);
+
+	if (dev_net(net) != &init_net)
+		return;
+
 	device_del(dev);
 }
 
@@ -501,6 +505,9 @@ int netdev_register_kobject(struct net_device *net)
 #endif
 #endif /* CONFIG_SYSFS */
 
+	if (dev_net(net) != &init_net)
+		return 0;
+
 	return device_add(dev);
 }
 
-- 
1.5.3.rc6.17.g1911


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] netns: Coexist with the sysfs limitations
  2008-10-22 20:32 ` [PATCH] netns: Coexist with the sysfs limitations Eric W. Biederman
@ 2008-10-22 20:40   ` Daniel Lezcano
  2008-10-22 21:21   ` Serge E. Hallyn
  1 sibling, 0 replies; 25+ messages in thread
From: Daniel Lezcano @ 2008-10-22 20:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Benjamin Thery, netdev, Dave Miller, Greg Kroah-Hartman, Al Viro,
	Serge Hallyn, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Eric W. Biederman wrote:
> To make testing of the network namespace simpler allow
> the network namespace code and the sysfs code to be
> compiled and run at the same time.  To do this only
> virtual devices are allowed in the additional network
> namespaces and those virtual devices are not placed
> in the kobject tree.
> 
> Since virtual devices don't actually do anything interesting
> hardware wise that needs device management there should
> be no loss in keeping them out of the kobject tree and
> by implication sysfs.  The gain in ease of testing
> and code coverage should be significant.
> 
> I.e. people running distributions that make it next to
> impossible to boot without sysfs should at be able to
> boot a test kernel now.
> 
> Plus no ABIs are harmed with this patch.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>

> ---
>  net/Kconfig          |    2 +-
>  net/core/dev.c       |   12 +++++++++++-
>  net/core/net-sysfs.c |    7 +++++++
>  3 files changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/net/Kconfig b/net/Kconfig
> index d789d79..8c3d97c 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -27,7 +27,7 @@ menu "Networking options"
>  config NET_NS
>  	bool "Network namespace support"
>  	default n
> -	depends on EXPERIMENTAL && !SYSFS && NAMESPACES
> +	depends on EXPERIMENTAL && NAMESPACES
>  	help
>  	  Allow user space to create what appear to be multiple instances
>  	  of the network stack.
> diff --git a/net/core/dev.c b/net/core/dev.c
> index b8a4fd0..a7f0461 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4449,6 +4449,15 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
>  	if (dev->features & NETIF_F_NETNS_LOCAL)
>  		goto out;
> 
> +#ifdef CONFIG_SYSFS
> +	/* Don't allow real devices to be moved when sysfs
> +	 * is enabled.
> +	 */
> +	err = -EINVAL;
> +	if (dev->dev.parent)
> +		goto out;
> +#endif
> +
>  	/* Ensure the device has been registrered */
>  	err = -EINVAL;
>  	if (dev->reg_state != NETREG_REGISTERED)
> @@ -4506,6 +4515,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
>  	 */
>  	dev_addr_discard(dev);
> 
> +	netdev_unregister_kobject(dev);
> +
>  	/* Actually switch the network namespace */
>  	dev_net_set(dev, net);
> 
> @@ -4522,7 +4533,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
>  	}
> 
>  	/* Fixup kobjects */
> -	netdev_unregister_kobject(dev);
>  	err = netdev_register_kobject(dev);
>  	WARN_ON(err);
> 
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 92d6b94..85cb8bd 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -476,6 +476,10 @@ void netdev_unregister_kobject(struct net_device * net)
>  	struct device *dev = &(net->dev);
> 
>  	kobject_get(&dev->kobj);
> +
> +	if (dev_net(net) != &init_net)
> +		return;
> +
>  	device_del(dev);
>  }
> 
> @@ -501,6 +505,9 @@ int netdev_register_kobject(struct net_device *net)
>  #endif
>  #endif /* CONFIG_SYSFS */
> 
> +	if (dev_net(net) != &init_net)
> +		return 0;
> +
>  	return device_add(dev);
>  }
> 


-- 






















































Sauf indication contraire ci-dessus:
Compagnie IBM France
Siège Social : Tour Descartes, 2, avenue Gambetta, La Défense 5, 92400
Courbevoie
RCS Nanterre 552 118 465
Forme Sociale : S.A.S.
Capital Social : 542.737.118 ?
SIREN/SIRET : 552 118 465 02430

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 20:30   ` Serge E. Hallyn
@ 2008-10-22 21:01     ` Eric W. Biederman
  2008-10-22 21:55       ` Stephen Hemminger
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-22 21:01 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Benjamin Thery, netdev, Dave Miller, Greg Kroah-Hartman, Al Viro,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

"Serge E. Hallyn" <serue@us.ibm.com> writes:

> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> Benjamin Thery <benjamin.thery@bull.net> writes:
>> 
>> > Support for network namespaces in mainline is pretty complete for
>> > some time now, but there is still this issue with sysfs that prevents 
>> > more people to use it easily.
>> 
>> Ben your patchset is completely inappropriate.
>> 
>> Temporarily adding elements to the ABI that we intend to remove
>> is not a proper solution to this problem.
>> 
>> That user space visible ida you add is a namespace identifier that breaks
>> nested containers and migration.  It is very very very wrong.
>
> I disagree (not surprising :) completely.  The well-known userspace
> tools (ifconfig, ip, etc) will not see the lo@1, they'll see lo.
> Userspace in a container can either umount /sys completely, or do

The well-known user space tools don't use /sys at all.  Modern
network tools use rtnetlink (ip) old network tools use /proc/net.

Very few things actually use /sys and for those things lo@1 or
eth0@1 are completely useless except for implementing a FUSE
mock up of sysfs.  But you don't need anything in sysfs to do
that as all of the interesting information is available through
/proc/net or rtnetlink.

>
> 	mount -t tmpfs none /sys/class/net
> 	mount --bind /sys/devices/virtual/net/lo@1 /sys/class/net/lo
>
> if they really want to, in which case only their view
> of /sys/devices/virtual/net would be different.
>
> Eric, would you hate this less if it was under some
>
> 	CONFIG_SYSFS_NETNS_HACK
>
> config variable?

No.  ABI decisions are almost certainly irreversible.

If we need an immediate hack please see the patch I sent
in follow up.  We can achieve everything Ben is doing by simply
keeping virtual devices out of the kobject tree.  Keeping them
from showing up in sysfs.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 20:16 ` Greg KH
@ 2008-10-22 21:08   ` Eric W. Biederman
  2008-10-22 21:24     ` Greg KH
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-22 21:08 UTC (permalink / raw)
  To: Greg KH
  Cc: Benjamin Thery, netdev, Dave Miller, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Greg KH <gregkh@suse.de> writes:

> On Wed, Oct 22, 2008 at 05:21:44PM +0200, Benjamin Thery wrote:
>> Network devices from sub-network namespaces appear in sysfs
>> with a name that looks like this: device_name@netns_id
>> eg: lo@3, eth0@4e
>
> How does the default udev rules as shipped by most distros handle the
> renaming of the network device if the MAC address is duplicated like it
> will be for these eth devices?

The mac address is not duplicated.

Further devices like eth0@4e are completely unusable to the udev
rules in the initial network namespace because they can not talk
to or affect them.

As I read it Ben's ``solution'' puts entries in sysfs that are
completely unusable to udev.

Eric


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] netns: Coexist with the sysfs limitations
  2008-10-22 20:32 ` [PATCH] netns: Coexist with the sysfs limitations Eric W. Biederman
  2008-10-22 20:40   ` Daniel Lezcano
@ 2008-10-22 21:21   ` Serge E. Hallyn
  2008-10-23  8:04     ` Benjamin Thery
  1 sibling, 1 reply; 25+ messages in thread
From: Serge E. Hallyn @ 2008-10-22 21:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Benjamin Thery, netdev, Dave Miller, Greg Kroah-Hartman, Al Viro,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Quoting Eric W. Biederman (ebiederm@xmission.com):
> 
> To make testing of the network namespace simpler allow
> the network namespace code and the sysfs code to be
> compiled and run at the same time.  To do this only
> virtual devices are allowed in the additional network
> namespaces and those virtual devices are not placed
> in the kobject tree.
> 
> Since virtual devices don't actually do anything interesting
> hardware wise that needs device management there should
> be no loss in keeping them out of the kobject tree and
> by implication sysfs.  The gain in ease of testing
> and code coverage should be significant.
> 
> I.e. people running distributions that make it next to
> impossible to boot without sysfs should at be able to
> boot a test kernel now.
> 
> Plus no ABIs are harmed with this patch.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Duh.

Tested-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>

Thanks, Eric!  Thanks, Benjamin!

-serge

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 21:08   ` Eric W. Biederman
@ 2008-10-22 21:24     ` Greg KH
  0 siblings, 0 replies; 25+ messages in thread
From: Greg KH @ 2008-10-22 21:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Benjamin Thery, netdev, Dave Miller, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

On Wed, Oct 22, 2008 at 02:08:26PM -0700, Eric W. Biederman wrote:
> Greg KH <gregkh@suse.de> writes:
> 
> > On Wed, Oct 22, 2008 at 05:21:44PM +0200, Benjamin Thery wrote:
> >> Network devices from sub-network namespaces appear in sysfs
> >> with a name that looks like this: device_name@netns_id
> >> eg: lo@3, eth0@4e
> >
> > How does the default udev rules as shipped by most distros handle the
> > renaming of the network device if the MAC address is duplicated like it
> > will be for these eth devices?
> 
> The mac address is not duplicated.

Ah, ok, I really don't think I want to know more :)

> Further devices like eth0@4e are completely unusable to the udev
> rules in the initial network namespace because they can not talk
> to or affect them.

Oh, good point.

> As I read it Ben's ``solution'' puts entries in sysfs that are
> completely unusable to udev.

That's not a good thing to do, if udev can't see them, than HAL can't
see them, then the rest of userspace usually has no idea they are
present either.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 21:01     ` Eric W. Biederman
@ 2008-10-22 21:55       ` Stephen Hemminger
  2008-10-22 22:54         ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: Stephen Hemminger @ 2008-10-22 21:55 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, Benjamin Thery, netdev, Dave Miller,
	Greg Kroah-Hartman, Al Viro, Daniel Lezcano, linux-kernel,
	Tejun Heo, Denis Lunev, Linux Containers

On Wed, 22 Oct 2008 14:01:59 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> "Serge E. Hallyn" <serue@us.ibm.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >> Benjamin Thery <benjamin.thery@bull.net> writes:
> >> 
> >> > Support for network namespaces in mainline is pretty complete for
> >> > some time now, but there is still this issue with sysfs that prevents 
> >> > more people to use it easily.
> >> 
> >> Ben your patchset is completely inappropriate.
> >> 
> >> Temporarily adding elements to the ABI that we intend to remove
> >> is not a proper solution to this problem.
> >> 
> >> That user space visible ida you add is a namespace identifier that breaks
> >> nested containers and migration.  It is very very very wrong.
> >
> > I disagree (not surprising :) completely.  The well-known userspace
> > tools (ifconfig, ip, etc) will not see the lo@1, they'll see lo.
> > Userspace in a container can either umount /sys completely, or do
> 
> The well-known user space tools don't use /sys at all.  Modern
> network tools use rtnetlink (ip) old network tools use /proc/net.
> 
> Very few things actually use /sys and for those things lo@1 or
> eth0@1 are completely useless except for implementing a FUSE
> mock up of sysfs.  But you don't need anything in sysfs to do
> that as all of the interesting information is available through
> /proc/net or rtnetlink.

Lots of scripts are starting use /sys for things. It is easier to
parse /sys/class/net than the output of ifconfig or ip 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 21:55       ` Stephen Hemminger
@ 2008-10-22 22:54         ` Eric W. Biederman
  2008-10-23  4:14           ` Kyle Moffett
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-22 22:54 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Serge E. Hallyn, Benjamin Thery, netdev, Dave Miller,
	Greg Kroah-Hartman, Al Viro, Daniel Lezcano, linux-kernel,
	Tejun Heo, Denis Lunev, Linux Containers

Stephen Hemminger <shemminger@vyatta.com> writes:

>> The well-known user space tools don't use /sys at all.  Modern
>> network tools use rtnetlink (ip) old network tools use /proc/net.
>> 
>> Very few things actually use /sys and for those things lo@1 or
>> eth0@1 are completely useless except for implementing a FUSE
>> mock up of sysfs.  But you don't need anything in sysfs to do
>> that as all of the interesting information is available through
>> /proc/net or rtnetlink.
>
> Lots of scripts are starting use /sys for things. It is easier to
> parse /sys/class/net than the output of ifconfig or ip 

Yes.  So we need the correct values in /sys/class/net.

Which is why sysfs and network namespaces do not coexist currently.

Which is why I recommend only placing devices in the initial network
namespace in sysfs for the short term.

Which is why we need to get all of the details correct when we
handle multiple network namespaces and sysfs.

I should have something working and should be sending patches out
shortly.  Cleaning up sysfs and the device model enough so the changes
are not spooky is a long hard road unfortunately.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 22:54         ` Eric W. Biederman
@ 2008-10-23  4:14           ` Kyle Moffett
  0 siblings, 0 replies; 25+ messages in thread
From: Kyle Moffett @ 2008-10-23  4:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Stephen Hemminger, Serge E. Hallyn, Benjamin Thery, netdev,
	Dave Miller, Greg Kroah-Hartman, Al Viro, Daniel Lezcano,
	linux-kernel, Tejun Heo, Denis Lunev, Linux Containers

On Wed, Oct 22, 2008 at 6:54 PM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
> Stephen Hemminger <shemminger@vyatta.com> writes:
>>> The well-known user space tools don't use /sys at all.  Modern
>>> network tools use rtnetlink (ip) old network tools use /proc/net.
>>>
>>> Very few things actually use /sys and for those things lo@1 or
>>> eth0@1 are completely useless except for implementing a FUSE
>>> mock up of sysfs.  But you don't need anything in sysfs to do
>>> that as all of the interesting information is available through
>>> /proc/net or rtnetlink.
>>
>> Lots of scripts are starting use /sys for things. It is easier to
>> parse /sys/class/net than the output of ifconfig or ip
>
> Yes.  So we need the correct values in /sys/class/net.
>
> Which is why sysfs and network namespaces do not coexist currently.

I definitely agree that "eth0@1" is a bad ide. I know for sure that a
relatively large number of system init scripts poke about in
/sys/class/net/$IFACE, as well as a number of the system installation
scripts.  Those scripts include some that my company has written for
internal use and others supplied by distributions.

Daemons and such mostly use netlink, but for anything written in shell
it's much easier to do things like this:

for devpath in /sys/class/net/*; do
    dev="${dev##/sys/class/net/}"
    case "${dev}" in
        [....]
    esac
    if [ "x${mac}" = "x$(cat ${devpath}/address)" ]; then
        echo "Found MAC '${mac}': ${dev}"
    fi
done

If I wanted to perform similar things with the output of ifconfig, it
would involve some much more fragile manual text parsing of the output
of the "ip" or "ifconfig" commands.  And sometimes the "ifconfig"
command is outright wrong.  If you have an interface name longer than
7 characters or so then some versions of ifconfig will truncate it
internally and display garbage.

While the "show only the root namespace" interim solution is
problematic for processes in network namespaces, I think it's more
important not to break things for admin tools in the root namespace.

Cheers,
Kyle Moffett

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] netns: Coexist with the sysfs limitations
  2008-10-22 21:21   ` Serge E. Hallyn
@ 2008-10-23  8:04     ` Benjamin Thery
  2008-10-23 15:40       ` Eric W. Biederman
  2008-10-23 15:56       ` [PATCH] netns: Coexist with the sysfs limitations v2 Eric W. Biederman
  0 siblings, 2 replies; 25+ messages in thread
From: Benjamin Thery @ 2008-10-23  8:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Serge E. Hallyn, netdev, Dave Miller, Greg Kroah-Hartman,
	Al Viro, Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Serge E. Hallyn wrote:
> Quoting Eric W. Biederman (ebiederm@xmission.com):
>> To make testing of the network namespace simpler allow
>> the network namespace code and the sysfs code to be
>> compiled and run at the same time.  To do this only
>> virtual devices are allowed in the additional network
>> namespaces and those virtual devices are not placed
>> in the kobject tree.
>>
>> Since virtual devices don't actually do anything interesting
>> hardware wise that needs device management there should
>> be no loss in keeping them out of the kobject tree and
>> by implication sysfs.  The gain in ease of testing
>> and code coverage should be significant.
>>
>> I.e. people running distributions that make it next to
>> impossible to boot without sysfs should at be able to
>> boot a test kernel now.
>>
>> Plus no ABIs are harmed with this patch.

>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> 
> Duh.
> 
> Tested-by: Serge Hallyn <serue@us.ibm.com>
> Acked-by: Serge Hallyn <serue@us.ibm.com>

Oh, this patch is short, clean, and the limitation introduced isn't too
annoying for testing netns right now.

At least, my proposal provoked some reactions :)

BTW, there's a second limitation with your patch:
we can't rename the net devices in the additional network namespaces.

In net/core/dev.c, dev_change_name() fails: call to device_rename() 
return an (expected) -EINVAL error.
Maybe we should add a test on the net to only call it in init_net?

  rollback:
-       err = device_rename(&dev->dev, dev->name);
-       if (err) {
-               memcpy(dev->name, oldname, IFNAMSIZ);
-               return err;
+       if (net == &init_net) {
+               err = device_rename(&dev->dev, dev->name);
+               if (err) {
+                       memcpy(dev->name, oldname, IFNAMSIZ);
+                       return err;
+               }
         }

Otherwise,

Acked-by: Benjamin Thery <benjamin.thery@bull.net>

Thanks,
Benjamin

> 
> Thanks, Eric!  Thanks, Benjamin!
> 
> -serge
> 
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-22 19:59 ` [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device " Eric W. Biederman
  2008-10-22 20:30   ` Serge E. Hallyn
@ 2008-10-23 11:56   ` Benjamin Thery
  2008-10-23 15:46     ` Eric W. Biederman
  1 sibling, 1 reply; 25+ messages in thread
From: Benjamin Thery @ 2008-10-23 11:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: netdev, Dave Miller, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Eric W. Biederman wrote:
> Benjamin Thery <benjamin.thery@bull.net> writes:
> 
>> Support for network namespaces in mainline is pretty complete for
>> some time now, but there is still this issue with sysfs that prevents 
>> more people to use it easily.
> 
> Ben your patchset is completely inappropriate.
> 
> Temporarily adding elements to the ABI that we intend to remove
> is not a proper solution to this problem.

In fact, I feared that would be an issue before I sent the patches.
Thanks for confirming this.

But, as this alternative was discussed a bit after Al's comments on your
last patches, I had to give it a try and code a proof-of-concept.

Benjamin

> 
> That user space visible ida you add is a namespace identifier that breaks
> nested containers and migration.  It is very very very wrong.
> 
> Eric
> 
> 
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] netns: Coexist with the sysfs limitations
  2008-10-23  8:04     ` Benjamin Thery
@ 2008-10-23 15:40       ` Eric W. Biederman
  2008-10-23 15:56       ` [PATCH] netns: Coexist with the sysfs limitations v2 Eric W. Biederman
  1 sibling, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-23 15:40 UTC (permalink / raw)
  To: Benjamin Thery
  Cc: Serge E. Hallyn, netdev, Dave Miller, Greg Kroah-Hartman,
	Al Viro, Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Benjamin Thery <benjamin.thery@bull.net> writes:

> Serge E. Hallyn wrote:
>> Quoting Eric W. Biederman (ebiederm@xmission.com):
>>> To make testing of the network namespace simpler allow
>>> the network namespace code and the sysfs code to be
>>> compiled and run at the same time.  To do this only
>>> virtual devices are allowed in the additional network
>>> namespaces and those virtual devices are not placed
>>> in the kobject tree.
>>>
>>> Since virtual devices don't actually do anything interesting
>>> hardware wise that needs device management there should
>>> be no loss in keeping them out of the kobject tree and
>>> by implication sysfs.  The gain in ease of testing
>>> and code coverage should be significant.
>>>
>>> I.e. people running distributions that make it next to
>>> impossible to boot without sysfs should at be able to
>>> boot a test kernel now.
>>>
>>> Plus no ABIs are harmed with this patch.
>
>>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>>
>> Duh.
>>
>> Tested-by: Serge Hallyn <serue@us.ibm.com>
>> Acked-by: Serge Hallyn <serue@us.ibm.com>
>
> Oh, this patch is short, clean, and the limitation introduced isn't too
> annoying for testing netns right now.
>
> At least, my proposal provoked some reactions :)

Yes.

> BTW, there's a second limitation with your patch:
> we can't rename the net devices in the additional network namespaces.
>
> In net/core/dev.c, dev_change_name() fails: call to device_rename() return an
> (expected) -EINVAL error.
> Maybe we should add a test on the net to only call it in init_net?

Yes.  Good catch.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries
  2008-10-23 11:56   ` Benjamin Thery
@ 2008-10-23 15:46     ` Eric W. Biederman
  0 siblings, 0 replies; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-23 15:46 UTC (permalink / raw)
  To: Benjamin Thery
  Cc: netdev, Dave Miller, Greg Kroah-Hartman, Al Viro, Serge Hallyn,
	Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers

Benjamin Thery <benjamin.thery@bull.net> writes:

> In fact, I feared that would be an issue before I sent the patches.
> Thanks for confirming this.
>
> But, as this alternative was discussed a bit after Al's comments on your
> last patches, I had to give it a try and code a proof-of-concept.

Yes.  We have a lot of reasons now for why that approach is a
bad idea.  Which should help a lot when it comes to handling sysfs
properly.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] netns: Coexist with the sysfs limitations v2
  2008-10-23  8:04     ` Benjamin Thery
  2008-10-23 15:40       ` Eric W. Biederman
@ 2008-10-23 15:56       ` Eric W. Biederman
  2008-10-27 19:41         ` David Miller
  1 sibling, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-23 15:56 UTC (permalink / raw)
  To: Dave Miller
  Cc: Benjamin Thery, Serge E. Hallyn, netdev, Greg Kroah-Hartman,
	Al Viro, Daniel Lezcano, linux-kernel, Tejun Heo, Denis Lunev,
	Linux Containers


To make testing of the network namespace simpler allow
the network namespace code and the sysfs code to be
compiled and run at the same time.  To do this only
virtual devices are allowed in the additional network
namespaces and those virtual devices are not placed
in the kobject tree.

Since virtual devices don't actually do anything interesting
hardware wise that needs device management there should
be no loss in keeping them out of the kobject tree and
by implication sysfs.  The gain in ease of testing
and code coverage should be significant.

Changelog:

v2: As pointed out by Benjamin Thery it only makes sense to call
    device_rename in the initial network namespace for now.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Benjamin Thery <benjamin.thery@bull.net>
Tested-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
---
 net/Kconfig          |    2 +-
 net/core/dev.c       |   25 ++++++++++++++++++++-----
 net/core/net-sysfs.c |    7 +++++++
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index d789d79..8c3d97c 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -27,7 +27,7 @@ menu "Networking options"
 config NET_NS
 	bool "Network namespace support"
 	default n
-	depends on EXPERIMENTAL && !SYSFS && NAMESPACES
+	depends on EXPERIMENTAL && NAMESPACES
 	help
 	  Allow user space to create what appear to be multiple instances
 	  of the network stack.
diff --git a/net/core/dev.c b/net/core/dev.c
index b8a4fd0..0e38594 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -924,10 +924,15 @@ int dev_change_name(struct net_device *dev, const char *newname)
 		strlcpy(dev->name, newname, IFNAMSIZ);
 
 rollback:
-	ret = device_rename(&dev->dev, dev->name);
-	if (ret) {
-		memcpy(dev->name, oldname, IFNAMSIZ);
-		return ret;
+	/* For now only devices in the initial network namespace
+	 * are in sysfs.
+	 */
+	if (net == &init_net) {
+		ret = device_rename(&dev->dev, dev->name);
+		if (ret) {
+			memcpy(dev->name, oldname, IFNAMSIZ);
+			return ret;
+		}
 	}
 
 	write_lock_bh(&dev_base_lock);
@@ -4449,6 +4454,15 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	if (dev->features & NETIF_F_NETNS_LOCAL)
 		goto out;
 
+#ifdef CONFIG_SYSFS
+	/* Don't allow real devices to be moved when sysfs
+	 * is enabled.
+	 */
+	err = -EINVAL;
+	if (dev->dev.parent)
+		goto out;
+#endif
+
 	/* Ensure the device has been registrered */
 	err = -EINVAL;
 	if (dev->reg_state != NETREG_REGISTERED)
@@ -4506,6 +4520,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	 */
 	dev_addr_discard(dev);
 
+	netdev_unregister_kobject(dev);
+
 	/* Actually switch the network namespace */
 	dev_net_set(dev, net);
 
@@ -4522,7 +4538,6 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	}
 
 	/* Fixup kobjects */
-	netdev_unregister_kobject(dev);
 	err = netdev_register_kobject(dev);
 	WARN_ON(err);
 
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 92d6b94..85cb8bd 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -476,6 +476,10 @@ void netdev_unregister_kobject(struct net_device * net)
 	struct device *dev = &(net->dev);
 
 	kobject_get(&dev->kobj);
+
+	if (dev_net(net) != &init_net)
+		return;
+
 	device_del(dev);
 }
 
@@ -501,6 +505,9 @@ int netdev_register_kobject(struct net_device *net)
 #endif
 #endif /* CONFIG_SYSFS */
 
+	if (dev_net(net) != &init_net)
+		return 0;
+
 	return device_add(dev);
 }
 
-- 
1.5.3.rc6.17.g1911


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] netns: Coexist with the sysfs limitations v2
  2008-10-23 15:56       ` [PATCH] netns: Coexist with the sysfs limitations v2 Eric W. Biederman
@ 2008-10-27 19:41         ` David Miller
  2008-10-27 20:19           ` Eric W. Biederman
  0 siblings, 1 reply; 25+ messages in thread
From: David Miller @ 2008-10-27 19:41 UTC (permalink / raw)
  To: ebiederm
  Cc: benjamin.thery, serue, netdev, gregkh, viro, dlezcano,
	linux-kernel, htejun, den, containers

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Thu, 23 Oct 2008 08:56:08 -0700

> To make testing of the network namespace simpler allow
> the network namespace code and the sysfs code to be
> compiled and run at the same time.  To do this only
> virtual devices are allowed in the additional network
> namespaces and those virtual devices are not placed
> in the kobject tree.
> 
> Since virtual devices don't actually do anything interesting
> hardware wise that needs device management there should
> be no loss in keeping them out of the kobject tree and
> by implication sysfs.  The gain in ease of testing
> and code coverage should be significant.
> 
> Changelog:
> 
> v2: As pointed out by Benjamin Thery it only makes sense to call
>     device_rename in the initial network namespace for now.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Acked-by: Benjamin Thery <benjamin.thery@bull.net>
> Tested-by: Serge Hallyn <serue@us.ibm.com>
> Acked-by: Serge Hallyn <serue@us.ibm.com>
> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>

So let's figure out what happens with this patch.
I'm personally ok with the change, the question is when
and where.

My net-2.6 tree was closed to new features long ago, so I really
don't want to try to merge this sucker into 2.6.28-rcX :-)  But if
you guys think that is prudent, feel free to submit it directly to
Linus and add my signoff:

Signed-off-by: David S. Miller <davem@davemloft.net>

otherwise if we shoot for 2.6.29 I would suggest that we wait until
the merge window to see if the sysfs issues get sorted, and if not
we slip this patch into to tree instead.

Let me know what you guys plan to do with this.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] netns: Coexist with the sysfs limitations v2
  2008-10-27 19:41         ` David Miller
@ 2008-10-27 20:19           ` Eric W. Biederman
  2008-10-28  0:50             ` David Miller
  0 siblings, 1 reply; 25+ messages in thread
From: Eric W. Biederman @ 2008-10-27 20:19 UTC (permalink / raw)
  To: David Miller
  Cc: benjamin.thery, serue, netdev, gregkh, viro, dlezcano,
	linux-kernel, htejun, den, containers

David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Thu, 23 Oct 2008 08:56:08 -0700
>
>> To make testing of the network namespace simpler allow
>> the network namespace code and the sysfs code to be
>> compiled and run at the same time.  To do this only
>> virtual devices are allowed in the additional network
>> namespaces and those virtual devices are not placed
>> in the kobject tree.
>> 
>> Since virtual devices don't actually do anything interesting
>> hardware wise that needs device management there should
>> be no loss in keeping them out of the kobject tree and
>> by implication sysfs.  The gain in ease of testing
>> and code coverage should be significant.
>> 
>> Changelog:
>> 
>> v2: As pointed out by Benjamin Thery it only makes sense to call
>>     device_rename in the initial network namespace for now.
>> 
>> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>> Acked-by: Benjamin Thery <benjamin.thery@bull.net>
>> Tested-by: Serge Hallyn <serue@us.ibm.com>
>> Acked-by: Serge Hallyn <serue@us.ibm.com>
>> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com>
>
> So let's figure out what happens with this patch.
> I'm personally ok with the change, the question is when
> and where.
>
> My net-2.6 tree was closed to new features long ago, so I really
> don't want to try to merge this sucker into 2.6.28-rcX :-)  But if
> you guys think that is prudent, feel free to submit it directly to
> Linus and add my signoff:
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
> otherwise if we shoot for 2.6.29 I would suggest that we wait until
> the merge window to see if the sysfs issues get sorted, and if not
> we slip this patch into to tree instead.
>
> Let me know what you guys plan to do with this.

What I was thinking is that it goes into your tree for 2.6.29.  Allowing
for better test coverage in the short term, and removing the pressure
to do a hack job on sysfs just to reduce the pain of testing.

The patch was sent during the merge window just because that is
when the conversation was happening.

I guess the pain with sysfs is having multiple patches in different
trees in this area causing conflicts in linux-next.  Ugh.  I can see
why you would want to hold back.  On the contrary point of view we
need that patch in someones tree or else we might as well merge it
now, if the plan is to merge it without it sitting in anyone's
development tree.

So my plan is I'm not going to worry about that patch, and leave it to
Ben and Daniel (if it needs a retransmit).  If it happens to merge
into net-next and that causes conflicts when we do a good job on
sysfs I will handle it.

Eric

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] netns: Coexist with the sysfs limitations v2
  2008-10-27 20:19           ` Eric W. Biederman
@ 2008-10-28  0:50             ` David Miller
  0 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2008-10-28  0:50 UTC (permalink / raw)
  To: ebiederm
  Cc: benjamin.thery, serue, netdev, gregkh, viro, dlezcano,
	linux-kernel, htejun, den, containers

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Mon, 27 Oct 2008 13:19:27 -0700

> What I was thinking is that it goes into your tree for 2.6.29.  Allowing
> for better test coverage in the short term, and removing the pressure
> to do a hack job on sysfs just to reduce the pain of testing.

Fair enough.  I'll add it to my tree.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-10-28  0:51 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-22 15:21 [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device sysfs entries Benjamin Thery
2008-10-22 15:21 ` [PATCH 1/4] netns: add in ida ID to identify the network namespace Benjamin Thery
2008-10-22 15:22 ` [PATCH 2/4] netns: Export nets id to /proc/net/netns Benjamin Thery
2008-10-22 15:22 ` [PATCH 3/4] net: cleanup some vars names to be more consistant with the network code Benjamin Thery
2008-10-22 15:22 ` [PATCH 4/4] netns: sysfs: add netns suffix to net devices sysfs entries Benjamin Thery
2008-10-22 19:59 ` [PATCH 0/4][RFC] netns: sysfs: add a netns suffix to net device " Eric W. Biederman
2008-10-22 20:30   ` Serge E. Hallyn
2008-10-22 21:01     ` Eric W. Biederman
2008-10-22 21:55       ` Stephen Hemminger
2008-10-22 22:54         ` Eric W. Biederman
2008-10-23  4:14           ` Kyle Moffett
2008-10-23 11:56   ` Benjamin Thery
2008-10-23 15:46     ` Eric W. Biederman
2008-10-22 20:16 ` Greg KH
2008-10-22 21:08   ` Eric W. Biederman
2008-10-22 21:24     ` Greg KH
2008-10-22 20:32 ` [PATCH] netns: Coexist with the sysfs limitations Eric W. Biederman
2008-10-22 20:40   ` Daniel Lezcano
2008-10-22 21:21   ` Serge E. Hallyn
2008-10-23  8:04     ` Benjamin Thery
2008-10-23 15:40       ` Eric W. Biederman
2008-10-23 15:56       ` [PATCH] netns: Coexist with the sysfs limitations v2 Eric W. Biederman
2008-10-27 19:41         ` David Miller
2008-10-27 20:19           ` Eric W. Biederman
2008-10-28  0:50             ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).