All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] RFC: ipv4: share sysctl net/ipv4/conf/DEVNAME/ tables
@ 2011-01-15  2:46 Lucian Adrian Grijincu
  2011-01-15 10:41 ` Alexey Dobriyan
  0 siblings, 1 reply; 4+ messages in thread
From: Lucian Adrian Grijincu @ 2011-01-15  2:46 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Nick Piggin,
	Al Viro, Christoph Hellwig, Lucian Adrian Grijincu, Dave Chinner,
	Neil Horman, Eric Dumazet, Alexey Dobriyan, Octavian Purdila,
	Vlad Dogaru

[-- Attachment #1: Type: text/plain, Size: 1802 bytes --]

For each network device DEVNAME that supports ipv4 a new table was
registered in net/ipv4/conf/DEVNAME/. The sysctl table was identical
for all network devices, except for the ->data, ->extra1 and ->extra2
members.

However, given the 'struct net_device*' corresponding to the device
DEVNAME we can compute data and extra1.

extra2 cannot be computed, it is the 'struct net*' of the device and
it is used to find the correct 'struct net_device*' with this name
(e.g. we will have 'lo' in every network namespace).

We cannot use the current process' network namespace. For example,
after running this code:

   int fd = open("/proc/sys/net/ipv4/conf/default/tag", O_RDONLY);
   unshare(CLONE_NEWNET);

'fd' points to a sysctl for a network namespace different from the
current process' network namespace.

To gain access to the name of the directory above a file, sysctl
handlers are passed an extra argument: the 'struct file*'
corresponding to the file. From the file we walk up one level to find
the name of the device. None of the other handlers were changed to
receive this extra parameter, but due to C's calling convention they
shouldn't care.

Each table has 26 entries (25 files + 1 sentinel). For each net device
this patch should save:
* 32bit: 26 * 36 = 936 bytes
* 64bit: 26 * 64 = 1664 bytes

**This patch was not thoroughly tested, I'm just looking for feedback
whether the approach is sound and could be applied. Similar changes
can be implemented for ipv6 too.**

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c      |   11 ++-
 include/linux/inetdevice.h |   14 +++-
 include/net/netns/ipv4.h   |   14 +++
 net/ipv4/devinet.c         |  236 +++++++++++++++++++++++++++++---------------
 4 files changed, 194 insertions(+), 81 deletions(-)

[-- Attachment #2: 0001-RFC-ipv4-share-sysctl-net-ipv4-conf-DEVNAME-tables.patch --]
[-- Type: text/x-patch, Size: 15497 bytes --]

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 09a1f92..901bfe3 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -129,6 +129,11 @@ out:
 	return err;
 }
 
+
+typedef int proc_handler_extended(struct ctl_table *ctl, int write,
+				  void __user *buffer, size_t *lenp, loff_t *ppos,
+				  struct file *filp);
+
 static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 		size_t count, loff_t *ppos, int write)
 {
@@ -137,6 +142,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
 	ssize_t error;
 	size_t res;
+	proc_handler_extended *phx = (proc_handler_extended *) table->proc_handler;
 
 	if (IS_ERR(head))
 		return PTR_ERR(head);
@@ -156,7 +162,10 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 
 	/* careful: calling conventions are nasty here */
 	res = count;
-	error = table->proc_handler(table, write, buf, &res, ppos);
+	/* most handlers only use the first 5 arguments (without @filp).
+	 * Changing all is too much of work, as, at the time of writting only
+	 * one proc_handler knows about and uses the @filp. */
+	error = phx(table, write, buf, &res, ppos, filp);
 	if (!error)
 		error = res;
 out:
diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index ae8fdc5..abe2e25 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -43,8 +43,20 @@ enum
 
 #define IPV4_DEVCONF_MAX (__IPV4_DEVCONF_MAX - 1)
 
+
+struct devinet_sysctl {
+	/*
+	 * dev_name holds a copy of dev_name, because '.procname' is regarded as const
+	 * by sysctl and we wouldn't want anyone to change it under our feet
+	 * (see SIOCSIFNAME).
+	 */
+	char *dev_name;
+	struct ctl_table_header *sysctl_header;
+};
+
+
 struct ipv4_devconf {
-	void	*sysctl;
+	struct devinet_sysctl devinet_sysctl;
 	int	data[IPV4_DEVCONF_MAX];
 	DECLARE_BITMAP(state, IPV4_DEVCONF_MAX);
 };
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index d68c3f1..5210215 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -7,6 +7,7 @@
 
 #include <net/inet_frag.h>
 
+struct ctl_table;
 struct ctl_table_header;
 struct ipv4_devconf;
 struct fib_rules_ops;
@@ -19,6 +20,19 @@ struct netns_ipv4 {
 	struct ctl_table_header	*frags_hdr;
 	struct ctl_table_header	*ipv4_hdr;
 	struct ctl_table_header *route_hdr;
+	/* This holds the contents of /proc/sys/net/ipv4/conf/$DEV_NAME/
+	 *
+	 * The devinet_sysctl_table is shared by all network devices
+	 * in the same network namespace, but each network namespace
+	 * needs it's own copy of the table (the ->extra2 member of
+	 * each ctl_table must point to the corresponding 'struct net*').
+	 *
+	 * The ->data member is not used as defined in this table. The
+	 * proc_handler determines the apropiate data location based
+	 * on the 'struct net_device*' having the same name as
+	 * $DEV_NAME above.
+	 */
+	struct ctl_table	*devinet_sysctl_table;
 #endif
 	struct ipv4_devconf	*devconf_all;
 	struct ipv4_devconf	*devconf_dflt;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 748cb5b..d03dfdc 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -158,7 +158,8 @@ static struct in_device *inetdev_init(struct net_device *dev)
 		goto out;
 	memcpy(&in_dev->cnf, dev_net(dev)->ipv4.devconf_dflt,
 			sizeof(in_dev->cnf));
-	in_dev->cnf.sysctl = NULL;
+	in_dev->cnf.devinet_sysctl.dev_name = NULL;
+	in_dev->cnf.devinet_sysctl.sysctl_header = NULL;
 	in_dev->dev = dev;
 	in_dev->arp_parms = neigh_parms_alloc(dev, &arp_tbl);
 	if (!in_dev->arp_parms)
@@ -1375,9 +1376,56 @@ static void inet_forward_change(struct net *net)
 	}
 }
 
-static int devinet_conf_proc(ctl_table *ctl, int write,
-			     void __user *buffer,
-			     size_t *lenp, loff_t *ppos)
+
+
+static int devinet_conf_handler(ctl_table *ctl, int write,
+				void __user *buffer,
+				size_t *lenp, loff_t *ppos,
+				struct file *filp,
+				proc_handler *proc_handler)
+{
+	/* The path to this file is of the form /proc/sys/net/ipv4/conf/$DEVNAME/$CTL
+	 *
+	 * To save space, ctl_table is shared between all network
+	 * devices in the same network namespace, but we need to
+	 * change the data corresponding to the $DEVNAME network
+	 * device, not any other's.
+	 *
+	 * Use $DEVNAME to obtain the coresponding ipv4_devconf.
+	 */
+	struct net *net = ctl->extra2;
+	const char *dev_name = filp->f_path.dentry->d_parent->d_name.name;
+	struct ctl_table tmp_ctl = *ctl;
+	struct net_device *dev = NULL;
+	struct ipv4_devconf *cnf;
+	int ret;
+
+	if (strcmp(dev_name, "all") == 0) {
+		cnf = net->ipv4.devconf_all;
+	} else if (strcmp(dev_name, "default") == 0) {
+		cnf = net->ipv4.devconf_dflt;
+	} else {
+		/* the device could have been renamed (SIOCSIFADDR) or
+		 * deleted since we started accessing it's proc sysctl */
+		dev = dev_get_by_name(net, dev_name);
+		if (dev == NULL)
+			return -ENOENT;
+		cnf = &in_dev_get(dev)->cnf;
+	}
+
+	tmp_ctl.data += (char *)cnf - (char *)&ipv4_devconf;
+	tmp_ctl.extra1 = cnf;
+
+	ret = proc_handler(&tmp_ctl, write, buffer, lenp, ppos);
+
+	if (dev)
+		dev_put(dev);
+	return ret;
+}
+
+static int __devinet_conf_proc(ctl_table *ctl, int write,
+			       void __user *buffer,
+			       size_t *lenp, loff_t *ppos)
 {
 	int ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
 
@@ -1395,9 +1443,9 @@ static int devinet_conf_proc(ctl_table *ctl, int write,
 	return ret;
 }
 
-static int devinet_sysctl_forward(ctl_table *ctl, int write,
-				  void __user *buffer,
-				  size_t *lenp, loff_t *ppos)
+static int __devinet_sysctl_forward(ctl_table *ctl, int write,
+				    void __user *buffer,
+				    size_t *lenp, loff_t *ppos)
 {
 	int *valp = ctl->data;
 	int val = *valp;
@@ -1430,9 +1478,10 @@ static int devinet_sysctl_forward(ctl_table *ctl, int write,
 	return ret;
 }
 
-static int ipv4_doint_and_flush(ctl_table *ctl, int write,
-				void __user *buffer,
-				size_t *lenp, loff_t *ppos)
+
+static int __ipv4_doint_and_flush(ctl_table *ctl, int write,
+				  void __user *buffer,
+				  size_t *lenp, loff_t *ppos)
 {
 	int *valp = ctl->data;
 	int val = *valp;
@@ -1445,6 +1494,33 @@ static int ipv4_doint_and_flush(ctl_table *ctl, int write,
 	return ret;
 }
 
+static int devinet_conf_proc(ctl_table *ctl, int write,
+			     void __user *buffer,
+			     size_t *lenp, loff_t *ppos,
+			     struct file *filp)
+{
+	return devinet_conf_handler(ctl, write, buffer, lenp, ppos, filp,
+				    __devinet_conf_proc);
+}
+
+static int devinet_sysctl_forward(ctl_table *ctl, int write,
+				  void __user *buffer,
+				  size_t *lenp, loff_t *ppos,
+				  struct file *filp)
+{
+	return devinet_conf_handler(ctl, write, buffer, lenp, ppos, filp,
+				    __devinet_sysctl_forward);
+}
+
+static int ipv4_doint_and_flush(ctl_table *ctl, int write,
+				void __user *buffer,
+				size_t *lenp, loff_t *ppos,
+				struct file *filp)
+{
+	return devinet_conf_handler(ctl, write, buffer, lenp, ppos, filp,
+				    __ipv4_doint_and_flush);
+}
+
 #define DEVINET_SYSCTL_ENTRY(attr, name, mval, proc) \
 	{ \
 		.procname	= name, \
@@ -1454,6 +1530,7 @@ static int ipv4_doint_and_flush(ctl_table *ctl, int write,
 		.mode		= mval, \
 		.proc_handler	= proc, \
 		.extra1		= &ipv4_devconf, \
+		.extra2		= &init_net, \
 	}
 
 #define DEVINET_SYSCTL_RW_ENTRY(attr, name) \
@@ -1468,51 +1545,45 @@ static int ipv4_doint_and_flush(ctl_table *ctl, int write,
 #define DEVINET_SYSCTL_FLUSHING_ENTRY(attr, name) \
 	DEVINET_SYSCTL_COMPLEX_ENTRY(attr, name, ipv4_doint_and_flush)
 
-static struct devinet_sysctl_table {
-	struct ctl_table_header *sysctl_header;
-	struct ctl_table devinet_vars[__IPV4_DEVCONF_MAX];
-	char *dev_name;
-} devinet_sysctl = {
-	.devinet_vars = {
-		DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, "forwarding",
-					     devinet_sysctl_forward),
-		DEVINET_SYSCTL_RO_ENTRY(MC_FORWARDING, "mc_forwarding"),
-
-		DEVINET_SYSCTL_RW_ENTRY(ACCEPT_REDIRECTS, "accept_redirects"),
-		DEVINET_SYSCTL_RW_ENTRY(SECURE_REDIRECTS, "secure_redirects"),
-		DEVINET_SYSCTL_RW_ENTRY(SHARED_MEDIA, "shared_media"),
-		DEVINET_SYSCTL_RW_ENTRY(RP_FILTER, "rp_filter"),
-		DEVINET_SYSCTL_RW_ENTRY(SEND_REDIRECTS, "send_redirects"),
-		DEVINET_SYSCTL_RW_ENTRY(ACCEPT_SOURCE_ROUTE,
-					"accept_source_route"),
-		DEVINET_SYSCTL_RW_ENTRY(ACCEPT_LOCAL, "accept_local"),
-		DEVINET_SYSCTL_RW_ENTRY(SRC_VMARK, "src_valid_mark"),
-		DEVINET_SYSCTL_RW_ENTRY(PROXY_ARP, "proxy_arp"),
-		DEVINET_SYSCTL_RW_ENTRY(MEDIUM_ID, "medium_id"),
-		DEVINET_SYSCTL_RW_ENTRY(BOOTP_RELAY, "bootp_relay"),
-		DEVINET_SYSCTL_RW_ENTRY(LOG_MARTIANS, "log_martians"),
-		DEVINET_SYSCTL_RW_ENTRY(TAG, "tag"),
-		DEVINET_SYSCTL_RW_ENTRY(ARPFILTER, "arp_filter"),
-		DEVINET_SYSCTL_RW_ENTRY(ARP_ANNOUNCE, "arp_announce"),
-		DEVINET_SYSCTL_RW_ENTRY(ARP_IGNORE, "arp_ignore"),
-		DEVINET_SYSCTL_RW_ENTRY(ARP_ACCEPT, "arp_accept"),
-		DEVINET_SYSCTL_RW_ENTRY(ARP_NOTIFY, "arp_notify"),
-		DEVINET_SYSCTL_RW_ENTRY(PROXY_ARP_PVLAN, "proxy_arp_pvlan"),
-
-		DEVINET_SYSCTL_FLUSHING_ENTRY(NOXFRM, "disable_xfrm"),
-		DEVINET_SYSCTL_FLUSHING_ENTRY(NOPOLICY, "disable_policy"),
-		DEVINET_SYSCTL_FLUSHING_ENTRY(FORCE_IGMP_VERSION,
-					      "force_igmp_version"),
-		DEVINET_SYSCTL_FLUSHING_ENTRY(PROMOTE_SECONDARIES,
-					      "promote_secondaries"),
-	},
+struct ctl_table ipv4_devinet_sysctl_table[__IPV4_DEVCONF_MAX] = {
+	DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, "forwarding",
+				     devinet_sysctl_forward),
+	DEVINET_SYSCTL_RO_ENTRY(MC_FORWARDING, "mc_forwarding"),
+
+	DEVINET_SYSCTL_RW_ENTRY(ACCEPT_REDIRECTS, "accept_redirects"),
+	DEVINET_SYSCTL_RW_ENTRY(SECURE_REDIRECTS, "secure_redirects"),
+	DEVINET_SYSCTL_RW_ENTRY(SHARED_MEDIA, "shared_media"),
+	DEVINET_SYSCTL_RW_ENTRY(RP_FILTER, "rp_filter"),
+	DEVINET_SYSCTL_RW_ENTRY(SEND_REDIRECTS, "send_redirects"),
+	DEVINET_SYSCTL_RW_ENTRY(ACCEPT_SOURCE_ROUTE,
+				"accept_source_route"),
+	DEVINET_SYSCTL_RW_ENTRY(ACCEPT_LOCAL, "accept_local"),
+	DEVINET_SYSCTL_RW_ENTRY(SRC_VMARK, "src_valid_mark"),
+	DEVINET_SYSCTL_RW_ENTRY(PROXY_ARP, "proxy_arp"),
+	DEVINET_SYSCTL_RW_ENTRY(MEDIUM_ID, "medium_id"),
+	DEVINET_SYSCTL_RW_ENTRY(BOOTP_RELAY, "bootp_relay"),
+	DEVINET_SYSCTL_RW_ENTRY(LOG_MARTIANS, "log_martians"),
+	DEVINET_SYSCTL_RW_ENTRY(TAG, "tag"),
+	DEVINET_SYSCTL_RW_ENTRY(ARPFILTER, "arp_filter"),
+	DEVINET_SYSCTL_RW_ENTRY(ARP_ANNOUNCE, "arp_announce"),
+	DEVINET_SYSCTL_RW_ENTRY(ARP_IGNORE, "arp_ignore"),
+	DEVINET_SYSCTL_RW_ENTRY(ARP_ACCEPT, "arp_accept"),
+	DEVINET_SYSCTL_RW_ENTRY(ARP_NOTIFY, "arp_notify"),
+	DEVINET_SYSCTL_RW_ENTRY(PROXY_ARP_PVLAN, "proxy_arp_pvlan"),
+
+	DEVINET_SYSCTL_FLUSHING_ENTRY(NOXFRM, "disable_xfrm"),
+	DEVINET_SYSCTL_FLUSHING_ENTRY(NOPOLICY, "disable_policy"),
+	DEVINET_SYSCTL_FLUSHING_ENTRY(FORCE_IGMP_VERSION,
+				      "force_igmp_version"),
+	DEVINET_SYSCTL_FLUSHING_ENTRY(PROMOTE_SECONDARIES,
+				      "promote_secondaries"),
+	{ }
 };
 
 static int __devinet_sysctl_register(struct net *net, char *dev_name,
-					struct ipv4_devconf *p)
+				     struct ipv4_devconf *cnf)
 {
-	int i;
-	struct devinet_sysctl_table *t;
+	struct devinet_sysctl *dsys = &cnf->devinet_sysctl;
 
 #define DEVINET_CTL_PATH_DEV	3
 
@@ -1524,54 +1595,42 @@ static int __devinet_sysctl_register(struct net *net, char *dev_name,
 		{ },
 	};
 
-	t = kmemdup(&devinet_sysctl, sizeof(*t), GFP_KERNEL);
-	if (!t)
-		goto out;
-
-	for (i = 0; i < ARRAY_SIZE(t->devinet_vars) - 1; i++) {
-		t->devinet_vars[i].data += (char *)p - (char *)&ipv4_devconf;
-		t->devinet_vars[i].extra1 = p;
-		t->devinet_vars[i].extra2 = net;
-	}
-
 	/*
 	 * Make a copy of dev_name, because '.procname' is regarded as const
 	 * by sysctl and we wouldn't want anyone to change it under our feet
 	 * (see SIOCSIFNAME).
 	 */
-	t->dev_name = kstrdup(dev_name, GFP_KERNEL);
-	if (!t->dev_name)
-		goto free;
+	dsys->dev_name = kstrdup(dev_name, GFP_KERNEL);
+	if (!dsys->dev_name)
+		goto out;
 
-	devinet_ctl_path[DEVINET_CTL_PATH_DEV].procname = t->dev_name;
+	devinet_ctl_path[DEVINET_CTL_PATH_DEV].procname = dsys->dev_name;
 
-	t->sysctl_header = register_net_sysctl_table(net, devinet_ctl_path,
-			t->devinet_vars);
-	if (!t->sysctl_header)
+	dsys->sysctl_header = register_net_sysctl_table(net, devinet_ctl_path,
+						net->ipv4.devinet_sysctl_table);
+	if (!dsys->sysctl_header)
 		goto free_procname;
 
-	p->sysctl = t;
 	return 0;
 
 free_procname:
-	kfree(t->dev_name);
-free:
-	kfree(t);
+	kfree(dsys->dev_name);
 out:
 	return -ENOBUFS;
 }
 
 static void __devinet_sysctl_unregister(struct ipv4_devconf *cnf)
 {
-	struct devinet_sysctl_table *t = cnf->sysctl;
+	struct devinet_sysctl *dsys = &cnf->devinet_sysctl;
 
-	if (t == NULL)
+	if (dsys == NULL)
 		return;
 
-	cnf->sysctl = NULL;
-	unregister_sysctl_table(t->sysctl_header);
-	kfree(t->dev_name);
-	kfree(t);
+	unregister_sysctl_table(dsys->sysctl_header);
+	kfree(dsys->dev_name);
+
+	dsys->dev_name = NULL;
+	dsys->sysctl_header = NULL;
 }
 
 static void devinet_sysctl_register(struct in_device *idev)
@@ -1610,9 +1669,10 @@ static __net_initdata struct ctl_path net_ipv4_path[] = {
 
 static __net_init int devinet_init_net(struct net *net)
 {
-	int err;
+	int i, err;
 	struct ipv4_devconf *all, *dflt;
 #ifdef CONFIG_SYSCTL
+	struct ctl_table *devinet_sysctl_table;
 	struct ctl_table *tbl = ctl_forward_entry;
 	struct ctl_table_header *forw_hdr;
 #endif
@@ -1620,6 +1680,7 @@ static __net_init int devinet_init_net(struct net *net)
 	err = -ENOMEM;
 	all = &ipv4_devconf;
 	dflt = &ipv4_devconf_dflt;
+	devinet_sysctl_table = &ipv4_devinet_sysctl_table[0];
 
 	if (!net_eq(net, &init_net)) {
 		all = kmemdup(all, sizeof(ipv4_devconf), GFP_KERNEL);
@@ -1638,10 +1699,23 @@ static __net_init int devinet_init_net(struct net *net)
 		tbl[0].data = &all->data[IPV4_DEVCONF_FORWARDING - 1];
 		tbl[0].extra1 = all;
 		tbl[0].extra2 = net;
+
+		devinet_sysctl_table = kmemdup(ipv4_devinet_sysctl_table,
+					       sizeof(ipv4_devinet_sysctl_table),
+					       GFP_KERNEL);
+		if (devinet_sysctl_table == NULL)
+			goto err_alloc_devinet_sysctl_table;
+
+		/* the last element in the table is { } and must remain so */
+		for (i = 0; i < ARRAY_SIZE(ipv4_devinet_sysctl_table) - 1; i++) {
+			devinet_sysctl_table->extra2 = net;
+		}
 #endif
 	}
 
 #ifdef CONFIG_SYSCTL
+	net->ipv4.devinet_sysctl_table = devinet_sysctl_table;
+
 	err = __devinet_sysctl_register(net, "all", all);
 	if (err < 0)
 		goto err_reg_all;
@@ -1667,6 +1741,9 @@ err_reg_ctl:
 err_reg_dflt:
 	__devinet_sysctl_unregister(all);
 err_reg_all:
+	if (devinet_sysctl_table != &ipv4_devinet_sysctl_table[0])
+		kfree(devinet_sysctl_table);
+err_alloc_devinet_sysctl_table:
 	if (tbl != ctl_forward_entry)
 		kfree(tbl);
 err_alloc_ctl:
@@ -1689,6 +1766,7 @@ static __net_exit void devinet_exit_net(struct net *net)
 	unregister_net_sysctl_table(net->ipv4.forw_hdr);
 	__devinet_sysctl_unregister(net->ipv4.devconf_dflt);
 	__devinet_sysctl_unregister(net->ipv4.devconf_all);
+	kfree(net->ipv4.devinet_sysctl_table);
 	kfree(tbl);
 #endif
 	kfree(net->ipv4.devconf_dflt);

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] RFC: ipv4: share sysctl net/ipv4/conf/DEVNAME/ tables
  2011-01-15  2:46 [PATCH] RFC: ipv4: share sysctl net/ipv4/conf/DEVNAME/ tables Lucian Adrian Grijincu
@ 2011-01-15 10:41 ` Alexey Dobriyan
  2011-01-15 11:09   ` Lucian Adrian Grijincu
  2011-01-28  6:21   ` Lucian Adrian Grijincu
  0 siblings, 2 replies; 4+ messages in thread
From: Alexey Dobriyan @ 2011-01-15 10:41 UTC (permalink / raw)
  To: Lucian Adrian Grijincu
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Nick Piggin,
	Al Viro, Christoph Hellwig, Dave Chinner, Neil Horman,
	Eric Dumazet, Octavian Purdila, Vlad Dogaru

On Sat, Jan 15, 2011 at 04:46:11AM +0200, Lucian Adrian Grijincu wrote:
> To gain access to the name of the directory above a file, sysctl
> handlers are passed an extra argument: the 'struct file*'
> corresponding to the file. From the file we walk up one level to find
> the name of the device. None of the other handlers were changed to
> receive this extra parameter, but due to C's calling convention they
> shouldn't care.

We don't do creepy stuff like that.

I wonder where interactions with device renaming are handled.

> +static int devinet_conf_handler(ctl_table *ctl, int write,
> +				void __user *buffer,
> +				size_t *lenp, loff_t *ppos,
> +				struct file *filp,
> +				proc_handler *proc_handler)
> +{
> +	/* The path to this file is of the form /proc/sys/net/ipv4/conf/$DEVNAME/$CTL
> +	 *
> +	 * To save space, ctl_table is shared between all network
> +	 * devices in the same network namespace, but we need to
> +	 * change the data corresponding to the $DEVNAME network
> +	 * device, not any other's.
> +	 *
> +	 * Use $DEVNAME to obtain the coresponding ipv4_devconf.
> +	 */
> +	struct net *net = ctl->extra2;
> +	const char *dev_name = filp->f_path.dentry->d_parent->d_name.name;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] RFC: ipv4: share sysctl net/ipv4/conf/DEVNAME/ tables
  2011-01-15 10:41 ` Alexey Dobriyan
@ 2011-01-15 11:09   ` Lucian Adrian Grijincu
  2011-01-28  6:21   ` Lucian Adrian Grijincu
  1 sibling, 0 replies; 4+ messages in thread
From: Lucian Adrian Grijincu @ 2011-01-15 11:09 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: netdev, David S. Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Nick Piggin,
	Al Viro, Christoph Hellwig, Dave Chinner, Neil Horman,
	Eric Dumazet, Octavian Purdila, Vlad Dogaru

On Sat, Jan 15, 2011 at 12:41 PM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On Sat, Jan 15, 2011 at 04:46:11AM +0200, Lucian Adrian Grijincu wrote:
>> To gain access to the name of the directory above a file, sysctl
>> handlers are passed an extra argument: the 'struct file*'
>> corresponding to the file. From the file we walk up one level to find
>> the name of the device. None of the other handlers were changed to
>> receive this extra parameter, but due to C's calling convention they
>> shouldn't care.
>
> We don't do creepy stuff like that.


I did this this way to not waste time changing all the handlers in the
tree and then get this patch struck down as uninterested, ugly-hack or
being suggested another more sensible, yet completely different
approach.


> I wonder where interactions with device renaming are handled.


I took two things taken into consideration:
* the .procname of the device directory has it's own copy of the device name
  This was inherited from the previous version

* when looking for a "struct net_device*" we might not find any:

+		/* the device could have been renamed (SIOCSIFADDR) or
+		 * deleted since we started accessing it's proc sysctl */
+		dev = dev_get_by_name(net, dev_name);
+		if (dev == NULL)
+			return -ENOENT;


I'm not sure (will look into it later) whether
   filp->f_path.dentry->d_parent->d_name.name;
is still valid if a rename is running concurrently.


On device rename we run this

net/ipv4/devinet.c:
static int inetdev_event(struct notifier_block *this, unsigned long event,
			 void *ptr)
....
	case NETDEV_CHANGENAME:
		/* Do not notify about label change, this event is
		 * not interesting to applications using netlink.
		 */
		inetdev_changename(dev, in_dev);

		devinet_sysctl_unregister(in_dev);
		devinet_sysctl_register(in_dev);


I'm not sure whether this code can run in parallel with the
proc_handlers and if it can if it will invalidate
   filp->f_path.dentry->d_parent->d_name.name;


but I'll look into it.

Is there anything else that I should check regarding net device renaming?

-- 
 .
..: Lucian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] RFC: ipv4: share sysctl net/ipv4/conf/DEVNAME/ tables
  2011-01-15 10:41 ` Alexey Dobriyan
  2011-01-15 11:09   ` Lucian Adrian Grijincu
@ 2011-01-28  6:21   ` Lucian Adrian Grijincu
  1 sibling, 0 replies; 4+ messages in thread
From: Lucian Adrian Grijincu @ 2011-01-28  6:21 UTC (permalink / raw)
  To: Alexey Dobriyan, netdev

[Resent to the list because the last reply got rejected because of HTML]


On Sat, Jan 15, 2011 at 12:41 PM, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> I wonder where interactions with device renaming are handled.

I did some digging and I'm pretty confident that this will not cause
problems with regards to device renaming.


On device rename these are the relevant call stacks:


RENAME
• dev_ioctl
  ∘ rtnl_lock();
  ∘ dev_ifsioc(net, &ifr, cmd);
    ‣ dev_change_name
      • dev_get_valid_name
        ∘ strlcpy(dev->name, name, IFNAMSIZ)
      • call_netdevice_notifiers(NETDEV_CHANGENAME);
        ∘ inetdev_event
          ‣ devinet_sysctl_unregister(in_dev)
            • unregister_sysctl_table(header)
              ∘ lock sysctl
              ∘ start_unregistering(header);
                ‣ if (header->used) { unlock sysctl,
wait_for_completion; lock sysctl }
              ∘ unlock sysctl
          ‣ devinet_sysctl_register(in_dev)
  ∘ rtnl_unlock();




HANDLER
• proc_sys_call_handler
  ∘ head = grab_header(inode)
    ‣ sysctl_head_grab
      • lock sysctl
      • head->used++
      • unlock sysctl
  ∘ if (IS_ERR(head)) return err
  ∘ devinet_conf_handler
    ‣ dev_get_by_name(dev, filp->f_path.dentry->d_parent->d_name.name)
  ∘ sysctl_head_finish(head)
    ‣ lock sysctl
    ‣ if (--head->used && unregistering) complete()
    ‣ unlock sysctl


Compressed:
RENAME (under rtnl lock)
• R1: memcpy(dev->name, newname)
• R2: if the sysctl header is used wait until it's not used any more,
      mark header as invalid

HANDLER:
• H1: get header, if header invalid, return error
• H2: dev_get_by_name
• H3: if there's someone waiting to unregister, complete it's action


Only one rename can be in progress at a time (because of the
rtnl_lock), so cases like A->B, C->A cannot run in parallel.  To
finish a device rename, we need to unregister the sysctl table header
first.


• R2 < H1: a RENAME runs before a HANDLER, then the HANDLER
           will fail at H1 (the sysctl header will be made invalid at R2).
• H1 < R2:
  ∘ HANDLER acquired the header
    ‣ R1 < H2: dev_get_by_name will not find the device (because R1 renamed it)
    ‣ R1 > H2: dev_get_by_name will return the correct device
      	       (the name is still valid)



In conclusion, I don't see any race conditions and I don't see how we
could get the wrong device after a rename.

I've posted a new version of the patch with some improvements.


-- 
 .
..: Lucian

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-01-28  6:22 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-15  2:46 [PATCH] RFC: ipv4: share sysctl net/ipv4/conf/DEVNAME/ tables Lucian Adrian Grijincu
2011-01-15 10:41 ` Alexey Dobriyan
2011-01-15 11:09   ` Lucian Adrian Grijincu
2011-01-28  6:21   ` Lucian Adrian Grijincu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.