All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages
@ 2009-11-10 17:54 Stephen Hemminger
  2009-11-10 17:54 ` [PATCH 01/10] netdev: add netdev_continue_rcu Stephen Hemminger
                   ` (10 more replies)
  0 siblings, 11 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

The goal is to eliminate dev_base_lock completely, and just use RCU
and rtnl_mutex for network devices. This series gets rid of the many
of the users of dev_base_lock just for reading.

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 01/10] netdev: add netdev_continue_rcu
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-10 18:19   ` Eric Dumazet
  2009-11-10 19:39   ` Paul E. McKenney
  2009-11-10 17:54 ` [PATCH 02/10] vlan: eliminate use of dev_base_lock Stephen Hemminger
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller, Paul E. McKenney; +Cc: netdev

[-- Attachment #1: continue-rcu.patch --]
[-- Type: text/plain, Size: 1771 bytes --]

This adds an RCU macro for continuing search, useful for some
network devices like vlan.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/include/linux/netdevice.h	2009-11-09 22:19:08.511480873 -0800
+++ b/include/linux/netdevice.h	2009-11-10 09:27:17.973376267 -0800
@@ -1079,6 +1079,8 @@ extern rwlock_t				dev_base_lock;		/* De
 		list_for_each_entry_safe(d, n, &(net)->dev_base_head, dev_list)
 #define for_each_netdev_continue(net, d)		\
 		list_for_each_entry_continue(d, &(net)->dev_base_head, dev_list)
+#define for_each_netdev_continue_rcu(net, d)		\
+	list_for_each_entry_continue_rcu(d, &(net)->dev_base_head, dev_list)
 #define net_device_entry(lh)	list_entry(lh, struct net_device, dev_list)
 
 static inline struct net_device *next_net_device(struct net_device *dev)
--- a/include/linux/rculist.h	2009-11-09 22:19:08.529480859 -0800
+++ b/include/linux/rculist.h	2009-11-10 09:27:17.974376609 -0800
@@ -262,6 +262,20 @@ static inline void list_splice_init_rcu(
 		(pos) = rcu_dereference((pos)->next))
 
 /**
+ * list_for_each_entry_continue_rcu - continue iteration over list of given type
+ * @pos:	the type * to use as a loop cursor.
+ * @head:	the head for your list.
+ * @member:	the name of the list_struct within the struct.
+ *
+ * Continue to iterate over list of given type, continuing after
+ * the current position.
+ */
+#define list_for_each_entry_continue_rcu(pos, head, member) 		\
+	for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \
+	     prefetch(pos->member.next), &pos->member != (head);	\
+	     pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
+
+/**
  * hlist_del_rcu - deletes entry from hash list without re-initialization
  * @n: the element to delete from the hash list.
  *

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 02/10] vlan: eliminate use of dev_base_lock
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
  2009-11-10 17:54 ` [PATCH 01/10] netdev: add netdev_continue_rcu Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-10 18:20   ` Eric Dumazet
  2009-11-10 17:54 ` [PATCH 03/10] net: use rcu for network scheduler API Stephen Hemminger
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller, Patrick McHardy; +Cc: netdev

[-- Attachment #1: vlan-rdlock.patch --]
[-- Type: text/plain, Size: 1311 bytes --]

Do not need to use read_lock(&dev_base_lock), use RCU instead.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/net/8021q/vlanproc.c	2009-11-09 22:19:08.712481032 -0800
+++ b/net/8021q/vlanproc.c	2009-11-10 09:27:28.165378176 -0800
@@ -201,18 +201,17 @@ int vlan_proc_rem_dev(struct net_device 
 
 /* start read of /proc/net/vlan/config */
 static void *vlan_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(dev_base_lock)
+	__acquires(rcu)
 {
 	struct net_device *dev;
 	struct net *net = seq_file_net(seq);
 	loff_t i = 1;
 
-	read_lock(&dev_base_lock);
-
+	rcu_read_lock();
 	if (*pos == 0)
 		return SEQ_START_TOKEN;
 
-	for_each_netdev(net, dev) {
+	for_each_netdev_rcu(net, dev) {
 		if (!is_vlan_dev(dev))
 			continue;
 
@@ -234,7 +233,7 @@ static void *vlan_seq_next(struct seq_fi
 	if (v == SEQ_START_TOKEN)
 		dev = net_device_entry(&net->dev_base_head);
 
-	for_each_netdev_continue(net, dev) {
+	for_each_netdev_continue_rcu(net, dev) {
 		if (!is_vlan_dev(dev))
 			continue;
 
@@ -245,9 +244,9 @@ static void *vlan_seq_next(struct seq_fi
 }
 
 static void vlan_seq_stop(struct seq_file *seq, void *v)
-	__releases(dev_base_lock)
+	__releases(rcu)
 {
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int vlan_seq_show(struct seq_file *seq, void *v)

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 03/10] net: use rcu for network scheduler API
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
  2009-11-10 17:54 ` [PATCH 01/10] netdev: add netdev_continue_rcu Stephen Hemminger
  2009-11-10 17:54 ` [PATCH 02/10] vlan: eliminate use of dev_base_lock Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-10 18:20   ` Eric Dumazet
  2009-11-10 17:54 ` [PATCH 04/10] AOE: use rcu to find network device Stephen Hemminger
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: sched-rcu.patch --]
[-- Type: text/plain, Size: 752 bytes --]

Use RCU to walk list of network devices in qdisc dump.
This could be optimized for large number of devices.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/sched/sch_api.c	2009-11-09 22:19:09.002480816 -0800
+++ b/net/sched/sch_api.c	2009-11-10 09:28:38.166439067 -0800
@@ -1279,9 +1279,10 @@ static int tc_dump_qdisc(struct sk_buff 
 
 	s_idx = cb->args[0];
 	s_q_idx = q_idx = cb->args[1];
-	read_lock(&dev_base_lock);
+
+	rcu_read_lock();
 	idx = 0;
-	for_each_netdev(&init_net, dev) {
+	for_each_netdev_rcu(&init_net, dev) {
 		struct netdev_queue *dev_queue;
 
 		if (idx < s_idx)
@@ -1302,7 +1303,7 @@ cont:
 	}
 
 done:
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 
 	cb->args[0] = idx;
 	cb->args[1] = q_idx;

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
                   ` (2 preceding siblings ...)
  2009-11-10 17:54 ` [PATCH 03/10] net: use rcu for network scheduler API Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-10 18:23   ` Eric Dumazet
  2009-11-10 20:01   ` Ed Cashin
  2009-11-10 17:54   ` Stephen Hemminger
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller, Ed L. Cashin, Harvey Harrison, Bartlomiej Zolnierkiewicz
  Cc: netdev

[-- Attachment #1: aoe-rcu.patch --]
[-- Type: text/plain, Size: 1240 bytes --]

This gets rid of another use of read_lock(&dev_base_lock) by using
RCU. Also, it only increments the reference count of the device actually
used rather than holding and releasing every device

Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/block/aoe/aoecmd.c	2009-11-09 22:19:06.082480836 -0800
+++ b/drivers/block/aoe/aoecmd.c	2009-11-10 09:28:38.222438732 -0800
@@ -296,17 +296,18 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigne
 	struct sk_buff *skb;
 	struct net_device *ifp;
 
-	read_lock(&dev_base_lock);
-	for_each_netdev(&init_net, ifp) {
-		dev_hold(ifp);
+	rcu_read_lock();
+	for_each_netdev_rcu(&init_net, ifp) {
 		if (!is_aoe_netif(ifp))
-			goto cont;
+			continue;
 
 		skb = new_skb(sizeof *h + sizeof *ch);
 		if (skb == NULL) {
 			printk(KERN_INFO "aoe: skb alloc failure\n");
-			goto cont;
+			continue;
 		}
+
+		dev_hold(ifp);
 		skb_put(skb, sizeof *h + sizeof *ch);
 		skb->dev = ifp;
 		__skb_queue_tail(queue, skb);
@@ -320,11 +321,8 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigne
 		h->major = cpu_to_be16(aoemajor);
 		h->minor = aoeminor;
 		h->cmd = AOECMD_CFG;
-
-cont:
-		dev_put(ifp);
 	}
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static void

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 05/10] parisc: use RCU to find network device
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
@ 2009-11-10 17:54   ` Stephen Hemminger
  2009-11-10 17:54 ` [PATCH 02/10] vlan: eliminate use of dev_base_lock Stephen Hemminger
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller, Kyle McMartin, Helge Deller, Alexander Beregalov
  Cc: netdev, linux-parisc

Another place where RCU can be used instead of read_lock(&dev_base_lock)
This is by inspection, don't have platform or cross-build environment
to validate.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/parisc/led.c	2009-11-09 22:19:07.223480872 -0800
+++ b/drivers/parisc/led.c	2009-11-10 09:28:38.279438787 -0800
@@ -354,9 +354,8 @@ static __inline__ int led_get_net_activi
 	
 	/* we are running as a workqueue task, so locking dev_base 
 	 * for reading should be OK */
-	read_lock(&dev_base_lock);
 	rcu_read_lock();
-	for_each_netdev(&init_net, dev) {
+	for_each_netdev_rcu(&init_net, dev) {
 	    const struct net_device_stats *stats;
 	    struct in_device *in_dev = __in_dev_get_rcu(dev);
 	    if (!in_dev || !in_dev->ifa_list)
@@ -368,7 +367,6 @@ static __inline__ int led_get_net_activi
 	    tx_total += stats->tx_packets;
 	}
 	rcu_read_unlock();
-	read_unlock(&dev_base_lock);
 
 	retval = 0;
 

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 05/10] parisc: use RCU to find network device
@ 2009-11-10 17:54   ` Stephen Hemminger
  0 siblings, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller, Kyle McMartin, Helge Deller, Alexander Beregalov
  Cc: netdev, linux-parisc

[-- Attachment #1: parisc-rdlock.patch --]
[-- Type: text/plain, Size: 943 bytes --]

Another place where RCU can be used instead of read_lock(&dev_base_lock)
This is by inspection, don't have platform or cross-build environment
to validate.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/drivers/parisc/led.c	2009-11-09 22:19:07.223480872 -0800
+++ b/drivers/parisc/led.c	2009-11-10 09:28:38.279438787 -0800
@@ -354,9 +354,8 @@ static __inline__ int led_get_net_activi
 	
 	/* we are running as a workqueue task, so locking dev_base 
 	 * for reading should be OK */
-	read_lock(&dev_base_lock);
 	rcu_read_lock();
-	for_each_netdev(&init_net, dev) {
+	for_each_netdev_rcu(&init_net, dev) {
 	    const struct net_device_stats *stats;
 	    struct in_device *in_dev = __in_dev_get_rcu(dev);
 	    if (!in_dev || !in_dev->ifa_list)
@@ -368,7 +367,6 @@ static __inline__ int led_get_net_activi
 	    tx_total += stats->tx_packets;
 	}
 	rcu_read_unlock();
-	read_unlock(&dev_base_lock);
 
 	retval = 0;
 

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 06/10] s390: use RCU to walk list of network devices
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
                   ` (4 preceding siblings ...)
  2009-11-10 17:54   ` Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-10 18:27   ` Eric Dumazet
  2009-11-10 18:40   ` Eric Dumazet
  2009-11-10 17:54 ` [PATCH 07/10] decnet: use RCU to find " Stephen Hemminger
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller, Martin Schwidefsky, Heiko Carstens; +Cc: netdev, linux390

[-- Attachment #1: s390-readlock.patch --]
[-- Type: text/plain, Size: 1004 bytes --]

This is similar to other cases where for_each_netdev_rcu
can be used when gathering information.

By inspection, don't have platform or cross-build environment
to validate.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/arch/s390/appldata/appldata_net_sum.c	2009-11-09 22:19:05.593480476 -0800
+++ b/arch/s390/appldata/appldata_net_sum.c	2009-11-10 09:28:38.335438652 -0800
@@ -83,8 +83,9 @@ static void appldata_get_net_sum_data(vo
 	rx_dropped = 0;
 	tx_dropped = 0;
 	collisions = 0;
-	read_lock(&dev_base_lock);
-	for_each_netdev(&init_net, dev) {
+
+	rcu_read_lock();
+	for_each_netdev_rcu(&init_net, dev) {
 		const struct net_device_stats *stats = dev_get_stats(dev);
 
 		rx_packets += stats->rx_packets;
@@ -98,7 +99,8 @@ static void appldata_get_net_sum_data(vo
 		collisions += stats->collisions;
 		i++;
 	}
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
+
 	net_data->nr_interfaces = i;
 	net_data->rx_packets = rx_packets;
 	net_data->tx_packets = tx_packets;

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 07/10] decnet: use RCU to find network devices
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
                   ` (5 preceding siblings ...)
  2009-11-10 17:54 ` [PATCH 06/10] s390: use RCU to walk list of network devices Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-10 18:43   ` Eric Dumazet
  2009-11-11  6:49   ` David Miller
  2009-11-10 17:54 ` [PATCH 08/10] ipv6: use RCU to walk list of " Stephen Hemminger
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller, Christine Caulfield, David S. Miller, Hannes Eder, Alexey
  Cc: netdev, linux-decnet-users

[-- Attachment #1: decnet-readlock.patch --]
[-- Type: text/plain, Size: 1576 bytes --]

When showing device statistics use RCU rather than read_lock(&dev_base_lock)
Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/decnet/dn_dev.c	2009-11-10 09:30:55.557376454 -0800
+++ b/net/decnet/dn_dev.c	2009-11-10 09:40:03.847005394 -0800
@@ -856,9 +856,7 @@ int dn_dev_bind_default(__le16 *addr)
 	dev = dn_dev_get_default();
 last_chance:
 	if (dev) {
-		read_lock(&dev_base_lock);
 		rv = dn_dev_get_first(dev, addr);
-		read_unlock(&dev_base_lock);
 		dev_put(dev);
 		if (rv == 0 || dev == init_net.loopback_dev)
 			return rv;
@@ -1323,18 +1321,18 @@ static inline int is_dn_dev(struct net_d
 }
 
 static void *dn_dev_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(&dev_base_lock)
+	__acquires(rcu)
 {
 	int i;
 	struct net_device *dev;
 
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 
 	if (*pos == 0)
 		return SEQ_START_TOKEN;
 
 	i = 1;
-	for_each_netdev(&init_net, dev) {
+	for_each_netdev_rcu(&init_net, dev) {
 		if (!is_dn_dev(dev))
 			continue;
 
@@ -1355,7 +1353,7 @@ static void *dn_dev_seq_next(struct seq_
 	if (v == SEQ_START_TOKEN)
 		dev = net_device_entry(&init_net.dev_base_head);
 
-	for_each_netdev_continue(&init_net, dev) {
+	for_each_netdev_continue_rcu(&init_net, dev) {
 		if (!is_dn_dev(dev))
 			continue;
 
@@ -1366,9 +1364,9 @@ static void *dn_dev_seq_next(struct seq_
 }
 
 static void dn_dev_seq_stop(struct seq_file *seq, void *v)
-	__releases(&dev_base_lock)
+	__releases(rcu)
 {
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static char *dn_type2asc(char type)

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 08/10] ipv6: use RCU to walk list of network devices
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
                   ` (6 preceding siblings ...)
  2009-11-10 17:54 ` [PATCH 07/10] decnet: use RCU to find " Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-11  6:50   ` David Miller
  2009-11-12  3:34   ` [PATCH net-next-2.6] " Eric Dumazet
  2009-11-10 17:54 ` [PATCH 09/10] IPV4: use rcu to walk list of devices in IGMP Stephen Hemminger
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: ipv6-rdlock.patch --]
[-- Type: text/plain, Size: 4200 bytes --]

No longer need read_lock(&dev_base_lock), use RCU instead.
This also needs to be optimized for large number of devices.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/ipv6/addrconf.c	2009-11-09 22:19:08.925480813 -0800
+++ b/net/ipv6/addrconf.c	2009-11-10 09:28:38.577438386 -0800
@@ -3828,9 +3828,9 @@ static int inet6_dump_ifinfo(struct sk_b
 	struct net_device *dev;
 	struct inet6_dev *idev;
 
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	idx = 0;
-	for_each_netdev(net, dev) {
+	for_each_netdev_rcu(net, dev) {
 		if (idx < s_idx)
 			goto cont;
 		if ((idev = in6_dev_get(dev)) == NULL)
@@ -3843,7 +3843,7 @@ static int inet6_dump_ifinfo(struct sk_b
 cont:
 		idx++;
 	}
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 	cb->args[0] = idx;
 
 	return skb->len;
--- a/net/ipv6/anycast.c	2009-11-09 22:19:08.926480759 -0800
+++ b/net/ipv6/anycast.c	2009-11-10 09:28:38.577438386 -0800
@@ -431,7 +431,7 @@ static inline struct ifacaddr6 *ac6_get_
 	struct net *net = seq_file_net(seq);
 
 	state->idev = NULL;
-	for_each_netdev(net, state->dev) {
+	for_each_netdev_rcu(net, state->dev) {
 		struct inet6_dev *idev;
 		idev = in6_dev_get(state->dev);
 		if (!idev)
@@ -482,9 +482,9 @@ static struct ifacaddr6 *ac6_get_idx(str
 }
 
 static void *ac6_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(dev_base_lock)
+	__acquires(rcu)
 {
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	return ac6_get_idx(seq, *pos);
 }
 
@@ -497,14 +497,14 @@ static void *ac6_seq_next(struct seq_fil
 }
 
 static void ac6_seq_stop(struct seq_file *seq, void *v)
-	__releases(dev_base_lock)
+	__releases(rcu)
 {
 	struct ac6_iter_state *state = ac6_seq_private(seq);
 	if (likely(state->idev != NULL)) {
 		read_unlock_bh(&state->idev->lock);
 		in6_dev_put(state->idev);
 	}
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int ac6_seq_show(struct seq_file *seq, void *v)
--- a/net/ipv6/mcast.c	2009-11-09 22:19:08.929480873 -0800
+++ b/net/ipv6/mcast.c	2009-11-10 09:28:38.578438329 -0800
@@ -2375,7 +2375,7 @@ static inline struct ifmcaddr6 *igmp6_mc
 	struct net *net = seq_file_net(seq);
 
 	state->idev = NULL;
-	for_each_netdev(net, state->dev) {
+	for_each_netdev_rcu(net, state->dev) {
 		struct inet6_dev *idev;
 		idev = in6_dev_get(state->dev);
 		if (!idev)
@@ -2426,9 +2426,9 @@ static struct ifmcaddr6 *igmp6_mc_get_id
 }
 
 static void *igmp6_mc_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(dev_base_lock)
+	__acquires(rcu)
 {
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	return igmp6_mc_get_idx(seq, *pos);
 }
 
@@ -2441,7 +2441,7 @@ static void *igmp6_mc_seq_next(struct se
 }
 
 static void igmp6_mc_seq_stop(struct seq_file *seq, void *v)
-	__releases(dev_base_lock)
+	__releases(rcu)
 {
 	struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq);
 	if (likely(state->idev != NULL)) {
@@ -2450,7 +2450,7 @@ static void igmp6_mc_seq_stop(struct seq
 		state->idev = NULL;
 	}
 	state->dev = NULL;
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int igmp6_mc_seq_show(struct seq_file *seq, void *v)
@@ -2507,7 +2507,7 @@ static inline struct ip6_sf_list *igmp6_
 
 	state->idev = NULL;
 	state->im = NULL;
-	for_each_netdev(net, state->dev) {
+	for_each_netdev_rcu(net, state->dev) {
 		struct inet6_dev *idev;
 		idev = in6_dev_get(state->dev);
 		if (unlikely(idev == NULL))
@@ -2573,9 +2573,9 @@ static struct ip6_sf_list *igmp6_mcf_get
 }
 
 static void *igmp6_mcf_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(dev_base_lock)
+	__acquires(rcu)
 {
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	return *pos ? igmp6_mcf_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 }
 
@@ -2591,7 +2591,7 @@ static void *igmp6_mcf_seq_next(struct s
 }
 
 static void igmp6_mcf_seq_stop(struct seq_file *seq, void *v)
-	__releases(dev_base_lock)
+	__releases(rcu)
 {
 	struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq);
 	if (likely(state->im != NULL)) {
@@ -2604,7 +2604,7 @@ static void igmp6_mcf_seq_stop(struct se
 		state->idev = NULL;
 	}
 	state->dev = NULL;
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int igmp6_mcf_seq_show(struct seq_file *seq, void *v)

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 09/10] IPV4: use rcu to walk list of devices in IGMP
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
                   ` (7 preceding siblings ...)
  2009-11-10 17:54 ` [PATCH 08/10] ipv6: use RCU to walk list of " Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-10 18:47   ` Eric Dumazet
  2009-11-11  6:50   ` David Miller
  2009-11-10 17:54 ` [PATCH 10/10] CAN: use dev_get_by_index_rcu Stephen Hemminger
  2009-11-10 18:18 ` [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Eric Dumazet
  10 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: ipv4-igmp.patch --]
[-- Type: text/plain, Size: 2400 bytes --]

This also needs to be optimized for large number of devices.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/ipv4/igmp.c	2009-11-09 22:19:08.899543825 -0800
+++ b/net/ipv4/igmp.c	2009-11-10 09:28:38.636438291 -0800
@@ -2311,7 +2311,7 @@ static inline struct ip_mc_list *igmp_mc
 	struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq);
 
 	state->in_dev = NULL;
-	for_each_netdev(net, state->dev) {
+	for_each_netdev_rcu(net, state->dev) {
 		struct in_device *in_dev;
 		in_dev = in_dev_get(state->dev);
 		if (!in_dev)
@@ -2361,9 +2361,9 @@ static struct ip_mc_list *igmp_mc_get_id
 }
 
 static void *igmp_mc_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(dev_base_lock)
+	__acquires(rcu)
 {
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	return *pos ? igmp_mc_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 }
 
@@ -2379,7 +2379,7 @@ static void *igmp_mc_seq_next(struct seq
 }
 
 static void igmp_mc_seq_stop(struct seq_file *seq, void *v)
-	__releases(dev_base_lock)
+	__releases(rcu)
 {
 	struct igmp_mc_iter_state *state = igmp_mc_seq_private(seq);
 	if (likely(state->in_dev != NULL)) {
@@ -2388,7 +2388,7 @@ static void igmp_mc_seq_stop(struct seq_
 		state->in_dev = NULL;
 	}
 	state->dev = NULL;
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int igmp_mc_seq_show(struct seq_file *seq, void *v)
@@ -2462,7 +2462,7 @@ static inline struct ip_sf_list *igmp_mc
 
 	state->idev = NULL;
 	state->im = NULL;
-	for_each_netdev(net, state->dev) {
+	for_each_netdev_rcu(net, state->dev) {
 		struct in_device *idev;
 		idev = in_dev_get(state->dev);
 		if (unlikely(idev == NULL))
@@ -2528,8 +2528,9 @@ static struct ip_sf_list *igmp_mcf_get_i
 }
 
 static void *igmp_mcf_seq_start(struct seq_file *seq, loff_t *pos)
+	__acquires(rcu)
 {
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	return *pos ? igmp_mcf_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 }
 
@@ -2545,6 +2546,7 @@ static void *igmp_mcf_seq_next(struct se
 }
 
 static void igmp_mcf_seq_stop(struct seq_file *seq, void *v)
+	__releases(rcu)
 {
 	struct igmp_mcf_iter_state *state = igmp_mcf_seq_private(seq);
 	if (likely(state->im != NULL)) {
@@ -2557,7 +2559,7 @@ static void igmp_mcf_seq_stop(struct seq
 		state->idev = NULL;
 	}
 	state->dev = NULL;
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int igmp_mcf_seq_show(struct seq_file *seq, void *v)

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 10/10] CAN: use dev_get_by_index_rcu
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
                   ` (8 preceding siblings ...)
  2009-11-10 17:54 ` [PATCH 09/10] IPV4: use rcu to walk list of devices in IGMP Stephen Hemminger
@ 2009-11-10 17:54 ` Stephen Hemminger
  2009-11-10 18:34   ` Eric Dumazet
  2009-11-10 18:18 ` [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Eric Dumazet
  10 siblings, 1 reply; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 17:54 UTC (permalink / raw)
  To: David Miller, Oliver Hartkopp, Alexey Dobriyan, Lothar Wassmann; +Cc: netdev

[-- Attachment #1: bcm-dev-rcu.patch --]
[-- Type: text/plain, Size: 614 bytes --]

Use new function to avoid doing read_lock().

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/can/bcm.c	2009-11-10 09:45:16.301376272 -0800
+++ b/net/can/bcm.c	2009-11-10 09:46:30.125005956 -0800
@@ -139,13 +139,13 @@ static char *bcm_proc_getifname(char *re
 	if (!ifindex)
 		return "any";
 
-	read_lock(&dev_base_lock);
-	dev = __dev_get_by_index(&init_net, ifindex);
+	rcu_read_lock();
+	dev = dev_get_by_index_rcu(&init_net, ifindex);
 	if (dev)
 		strcpy(result, dev->name);
 	else
 		strcpy(result, "???");
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 
 	return result;
 }

-- 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages
  2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
                   ` (9 preceding siblings ...)
  2009-11-10 17:54 ` [PATCH 10/10] CAN: use dev_get_by_index_rcu Stephen Hemminger
@ 2009-11-10 18:18 ` Eric Dumazet
  2009-11-10 18:22   ` Stephen Hemminger
  2009-11-10 18:24   ` Stephen Hemminger
  10 siblings, 2 replies; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Stephen Hemminger a écrit :
> The goal is to eliminate dev_base_lock completely, and just use RCU
> and rtnl_mutex for network devices. This series gets rid of the many
> of the users of dev_base_lock just for reading.
> 

Nice, but I was doing all this work Stephen... maybe I am too slow ?

I believe you missed one of my patch (but David is traveling)

[PATCH net-next-2.6] ipv6: speedup inet6_dump_ifinfo()

This conflicts with your :

[PATCH 08/10] ipv6: use RCU to walk list of network devices


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/10] netdev: add netdev_continue_rcu
  2009-11-10 17:54 ` [PATCH 01/10] netdev: add netdev_continue_rcu Stephen Hemminger
@ 2009-11-10 18:19   ` Eric Dumazet
  2009-11-11  6:47     ` David Miller
  2009-11-10 19:39   ` Paul E. McKenney
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:19 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, Paul E. McKenney, netdev

Stephen Hemminger a écrit :
> This adds an RCU macro for continuing search, useful for some
> network devices like vlan.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/10] vlan: eliminate use of dev_base_lock
  2009-11-10 17:54 ` [PATCH 02/10] vlan: eliminate use of dev_base_lock Stephen Hemminger
@ 2009-11-10 18:20   ` Eric Dumazet
  2009-11-11  6:47     ` David Miller
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, Patrick McHardy, netdev

Stephen Hemminger a écrit :
> Do not need to use read_lock(&dev_base_lock), use RCU instead.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/10] net: use rcu for network scheduler API
  2009-11-10 17:54 ` [PATCH 03/10] net: use rcu for network scheduler API Stephen Hemminger
@ 2009-11-10 18:20   ` Eric Dumazet
  2009-11-11  6:47     ` David Miller
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:20 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Stephen Hemminger a écrit :
> Use RCU to walk list of network devices in qdisc dump.
> This could be optimized for large number of devices.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages
  2009-11-10 18:18 ` [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Eric Dumazet
@ 2009-11-10 18:22   ` Stephen Hemminger
  2009-11-10 18:24   ` Stephen Hemminger
  1 sibling, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 18:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, 10 Nov 2009 19:18:14 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > The goal is to eliminate dev_base_lock completely, and just use RCU
> > and rtnl_mutex for network devices. This series gets rid of the many
> > of the users of dev_base_lock just for reading.
> > 
> 
> Nice, but I was doing all this work Stephen... maybe I am too slow ?
> 
> I believe you missed one of my patch (but David is traveling)
> 
> [PATCH net-next-2.6] ipv6: speedup inet6_dump_ifinfo()
> 
> This conflicts with your :
> 
> [PATCH 08/10] ipv6: use RCU to walk list of network devices
> 

Inflight conflict.

-- 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 17:54 ` [PATCH 04/10] AOE: use rcu to find network device Stephen Hemminger
@ 2009-11-10 18:23   ` Eric Dumazet
  2009-11-10 20:01   ` Ed Cashin
  1 sibling, 0 replies; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:23 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Ed L. Cashin, Harvey Harrison,
	Bartlomiej Zolnierkiewicz, netdev

Stephen Hemminger a écrit :
> This gets rid of another use of read_lock(&dev_base_lock) by using
> RCU. Also, it only increments the reference count of the device actually
> used rather than holding and releasing every device
> 
> Compile tested only.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/10] decnet: use RCU to find network devices
  2009-11-10 18:50     ` Stephen Hemminger
@ 2009-11-10 18:24       ` steve
  2009-11-11 17:39         ` [PATCH 1/2] decnet: add RTNL lock when reading address list Stephen Hemminger
  2009-11-10 19:25       ` [PATCH 07/10] decnet: use RCU to find network devices Eric Dumazet
  1 sibling, 1 reply; 61+ messages in thread
From: steve @ 2009-11-10 18:24 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric Dumazet, David Miller, Christine Caulfield, Hannes Eder,
	Alexey Dobriyan, Steven Whitehouse, netdev, linux-decnet-users

Hi,

On Tue, Nov 10, 2009 at 10:50:53AM -0800, Stephen Hemminger wrote:
> On Tue, 10 Nov 2009 19:43:21 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Stephen Hemminger a écrit :
> > > When showing device statistics use RCU rather than read_lock(&dev_base_lock)
> > > Compile tested only.
> > > 
> > > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > > 
> > > --- a/net/decnet/dn_dev.c	2009-11-10 09:30:55.557376454 -0800
> > > +++ b/net/decnet/dn_dev.c	2009-11-10 09:40:03.847005394 -0800
> > > @@ -856,9 +856,7 @@ int dn_dev_bind_default(__le16 *addr)
> > >  	dev = dn_dev_get_default();
> > >  last_chance:
> > >  	if (dev) {
> > > -		read_lock(&dev_base_lock);
> > >  		rv = dn_dev_get_first(dev, addr);
> > > -		read_unlock(&dev_base_lock);
> > >  		dev_put(dev);
> > >  		if (rv == 0 || dev == init_net.loopback_dev)
> > >  			return rv;
> > > @@ -1323,18 +1321,18 @@ static inline int is_dn_dev(struct net_d
> > 
> > 
> > I dont understand this part. Why previous locking can be avoided ?
> 
> dn_dev_get_default acquires a reference on dev so the device can
> not go away.
> 
> It could be the original author meant to ensure the address list
> doesn't change. If so, then rtnl_lock() should have been used.

The original author has tried to remember what he was thinking when
he wrote this code :-)

I think you are right that it should be rtnl_lock() as we don't want
the address list changing at this point. On the other hand I notice
also that other bits of the code seem to be using dev_base_lock too.

Maybe this is a hang over from another era? I'll have to refresh
my memory some more before I can give a verdict on that I'm afraid,

Steve.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages
  2009-11-10 18:18 ` [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Eric Dumazet
  2009-11-10 18:22   ` Stephen Hemminger
@ 2009-11-10 18:24   ` Stephen Hemminger
  2009-11-10 18:39     ` Eric Dumazet
  1 sibling, 1 reply; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 18:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

I was just trying to pick up the stragglers you left behind :-)

The nasty cases left are bonding (whose existing locking model is a pile
of crap), and sysfs (slightly less stinky). The bonding code just needs to
be rewritten to have a sane/simple model. 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/10] parisc: use RCU to find network device
  2009-11-10 17:54   ` Stephen Hemminger
@ 2009-11-10 18:26     ` Eric Dumazet
  -1 siblings, 0 replies; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:26 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Kyle McMartin, Helge Deller, Alexander Beregalov,
	netdev, linux-parisc

Stephen Hemminger a =E9crit :
> Another place where RCU can be used instead of read_lock(&dev_base_lo=
ck)
> This is by inspection, don't have platform or cross-build environment
> to validate.
>=20
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>=20

Duplicate of previously posted patch...

http://article.gmane.org/gmane.linux.network/143072




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/10] parisc: use RCU to find network device
@ 2009-11-10 18:26     ` Eric Dumazet
  0 siblings, 0 replies; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:26 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Kyle McMartin, Helge Deller, Alexander Beregalov,
	netdev, linux-parisc

Stephen Hemminger a écrit :
> Another place where RCU can be used instead of read_lock(&dev_base_lock)
> This is by inspection, don't have platform or cross-build environment
> to validate.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 

Duplicate of previously posted patch...

http://article.gmane.org/gmane.linux.network/143072




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/10] s390: use RCU to walk list of network devices
  2009-11-10 17:54 ` [PATCH 06/10] s390: use RCU to walk list of network devices Stephen Hemminger
@ 2009-11-10 18:27   ` Eric Dumazet
  2009-11-10 18:29     ` Stephen Hemminger
  2009-11-10 18:40   ` Eric Dumazet
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:27 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Martin Schwidefsky, Heiko Carstens, netdev, linux390

Stephen Hemminger a écrit :
> This is similar to other cases where for_each_netdev_rcu
> can be used when gathering information.
> 
> By inspection, don't have platform or cross-build environment
> to validate.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
> --- a/arch/s390/appldata/appldata_net_sum.c	2009-11-09 22:19:05.593480476 -0800
> +++ b/arch/s390/appldata/appldata_net_sum.c	2009-11-10 09:28:38.335438652 -0800
> @@ -83,8 +83,9 @@ static void appldata_get_net_sum_data(vo
>  	rx_dropped = 0;
>  	tx_dropped = 0;
>  	collisions = 0;
> -	read_lock(&dev_base_lock);
> -	for_each_netdev(&init_net, dev) {
> +
> +	rcu_read_lock();
> +	for_each_netdev_rcu(&init_net, dev) {
>  		const struct net_device_stats *stats = dev_get_stats(dev);
>  
>  		rx_packets += stats->rx_packets;
> @@ -98,7 +99,8 @@ static void appldata_get_net_sum_data(vo
>  		collisions += stats->collisions;
>  		i++;
>  	}
> -	read_unlock(&dev_base_lock);
> +	rcu_read_unlock();
> +
>  	net_data->nr_interfaces = i;
>  	net_data->rx_packets = rx_packets;
>  	net_data->tx_packets = tx_packets;
> 

Not sure if dev_get_stats(dev) could sleep on some devices...


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/10] s390: use RCU to walk list of network devices
  2009-11-10 18:27   ` Eric Dumazet
@ 2009-11-10 18:29     ` Stephen Hemminger
  0 siblings, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 18:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Martin Schwidefsky, Heiko Carstens, netdev, linux390

On Tue, 10 Nov 2009 19:27:11 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > This is similar to other cases where for_each_netdev_rcu
> > can be used when gathering information.
> > 
> > By inspection, don't have platform or cross-build environment
> > to validate.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > 
> > --- a/arch/s390/appldata/appldata_net_sum.c	2009-11-09 22:19:05.593480476 -0800
> > +++ b/arch/s390/appldata/appldata_net_sum.c	2009-11-10 09:28:38.335438652 -0800
> > @@ -83,8 +83,9 @@ static void appldata_get_net_sum_data(vo
> >  	rx_dropped = 0;
> >  	tx_dropped = 0;
> >  	collisions = 0;
> > -	read_lock(&dev_base_lock);
> > -	for_each_netdev(&init_net, dev) {
> > +
> > +	rcu_read_lock();
> > +	for_each_netdev_rcu(&init_net, dev) {
> >  		const struct net_device_stats *stats = dev_get_stats(dev);
> >  
> >  		rx_packets += stats->rx_packets;
> > @@ -98,7 +99,8 @@ static void appldata_get_net_sum_data(vo
> >  		collisions += stats->collisions;
> >  		i++;
> >  	}
> > -	read_unlock(&dev_base_lock);
> > +	rcu_read_unlock();
> > +
> >  	net_data->nr_interfaces = i;
> >  	net_data->rx_packets = rx_packets;
> >  	net_data->tx_packets = tx_packets;
> > 
> 
> Not sure if dev_get_stats(dev) could sleep on some devices...
> 

It would have already been broken since dev_get_stats is previously
called with read_lock(), and sleeping with any lock held causes warning.

-- 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/10] CAN: use dev_get_by_index_rcu
  2009-11-10 17:54 ` [PATCH 10/10] CAN: use dev_get_by_index_rcu Stephen Hemminger
@ 2009-11-10 18:34   ` Eric Dumazet
  2009-11-11  5:54     ` Oliver Hartkopp
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:34 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Oliver Hartkopp, Alexey Dobriyan, Lothar Wassmann, netdev

Stephen Hemminger a écrit :
> Use new function to avoid doing read_lock().
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> --- a/net/can/bcm.c	2009-11-10 09:45:16.301376272 -0800
> +++ b/net/can/bcm.c	2009-11-10 09:46:30.125005956 -0800
> @@ -139,13 +139,13 @@ static char *bcm_proc_getifname(char *re
>  	if (!ifindex)
>  		return "any";
>  
> -	read_lock(&dev_base_lock);
> -	dev = __dev_get_by_index(&init_net, ifindex);
> +	rcu_read_lock();
> +	dev = dev_get_by_index_rcu(&init_net, ifindex);
>  	if (dev)
>  		strcpy(result, dev->name);
>  	else
>  		strcpy(result, "???");
> -	read_unlock(&dev_base_lock);
> +	rcu_read_unlock();
>  
>  	return result;
>  }
> 

I was pretty sure I had already done this one...

Ah yes, that was planned after a bugfix for net-2.6.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages
  2009-11-10 18:24   ` Stephen Hemminger
@ 2009-11-10 18:39     ` Eric Dumazet
  2009-11-10 18:53       ` Stephen Hemminger
  0 siblings, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Stephen Hemminger a écrit :
> I was just trying to pick up the stragglers you left behind :-)
> 
> The nasty cases left are bonding (whose existing locking model is a pile
> of crap), and sysfs (slightly less stinky). The bonding code just needs to
> be rewritten to have a sane/simple model. 
> 

I was thinking of improving bonding, so it might be good to coordinate our work :)

1) Get rid of rwlock in tx fast path. (RCU ? did I said RCU)

2) multi queue support (Got two dual 82599 dual ports cards from Intel :) )


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/10] s390: use RCU to walk list of network devices
  2009-11-10 17:54 ` [PATCH 06/10] s390: use RCU to walk list of network devices Stephen Hemminger
  2009-11-10 18:27   ` Eric Dumazet
@ 2009-11-10 18:40   ` Eric Dumazet
  2009-11-11  6:49     ` David Miller
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:40 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Martin Schwidefsky, Heiko Carstens, netdev, linux390

Stephen Hemminger a écrit :
> This is similar to other cases where for_each_netdev_rcu
> can be used when gathering information.
> 
> By inspection, don't have platform or cross-build environment
> to validate.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/10] decnet: use RCU to find network devices
  2009-11-10 17:54 ` [PATCH 07/10] decnet: use RCU to find " Stephen Hemminger
@ 2009-11-10 18:43   ` Eric Dumazet
  2009-11-10 18:50     ` Stephen Hemminger
  2009-11-11  6:49   ` David Miller
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:43 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Christine Caulfield, Hannes Eder, Alexey Dobriyan,
	Steven Whitehouse, netdev, linux-decnet-users

Stephen Hemminger a écrit :
> When showing device statistics use RCU rather than read_lock(&dev_base_lock)
> Compile tested only.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> --- a/net/decnet/dn_dev.c	2009-11-10 09:30:55.557376454 -0800
> +++ b/net/decnet/dn_dev.c	2009-11-10 09:40:03.847005394 -0800
> @@ -856,9 +856,7 @@ int dn_dev_bind_default(__le16 *addr)
>  	dev = dn_dev_get_default();
>  last_chance:
>  	if (dev) {
> -		read_lock(&dev_base_lock);
>  		rv = dn_dev_get_first(dev, addr);
> -		read_unlock(&dev_base_lock);
>  		dev_put(dev);
>  		if (rv == 0 || dev == init_net.loopback_dev)
>  			return rv;
> @@ -1323,18 +1321,18 @@ static inline int is_dn_dev(struct net_d


I dont understand this part. Why previous locking can be avoided ?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/10] IPV4: use rcu to walk list of devices in IGMP
  2009-11-10 17:54 ` [PATCH 09/10] IPV4: use rcu to walk list of devices in IGMP Stephen Hemminger
@ 2009-11-10 18:47   ` Eric Dumazet
  2009-11-11  6:50   ` David Miller
  1 sibling, 0 replies; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 18:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

Stephen Hemminger a écrit :
> This also needs to be optimized for large number of devices.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>  

>  static void igmp_mcf_seq_stop(struct seq_file *seq, void *v)
> +	__releases(rcu)

Minor note : we usually use RCU instead of rcu, but it doesnt really matter.

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/10] decnet: use RCU to find network devices
  2009-11-10 18:43   ` Eric Dumazet
@ 2009-11-10 18:50     ` Stephen Hemminger
  2009-11-10 18:24       ` steve
  2009-11-10 19:25       ` [PATCH 07/10] decnet: use RCU to find network devices Eric Dumazet
  0 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 18:50 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Christine Caulfield, Hannes Eder, Alexey Dobriyan,
	Steven Whitehouse, netdev, linux-decnet-users

On Tue, 10 Nov 2009 19:43:21 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > When showing device statistics use RCU rather than read_lock(&dev_base_lock)
> > Compile tested only.
> > 
> > Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> > 
> > --- a/net/decnet/dn_dev.c	2009-11-10 09:30:55.557376454 -0800
> > +++ b/net/decnet/dn_dev.c	2009-11-10 09:40:03.847005394 -0800
> > @@ -856,9 +856,7 @@ int dn_dev_bind_default(__le16 *addr)
> >  	dev = dn_dev_get_default();
> >  last_chance:
> >  	if (dev) {
> > -		read_lock(&dev_base_lock);
> >  		rv = dn_dev_get_first(dev, addr);
> > -		read_unlock(&dev_base_lock);
> >  		dev_put(dev);
> >  		if (rv == 0 || dev == init_net.loopback_dev)
> >  			return rv;
> > @@ -1323,18 +1321,18 @@ static inline int is_dn_dev(struct net_d
> 
> 
> I dont understand this part. Why previous locking can be avoided ?

dn_dev_get_default acquires a reference on dev so the device can
not go away.

It could be the original author meant to ensure the address list
doesn't change. If so, then rtnl_lock() should have been used.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages
  2009-11-10 18:39     ` Eric Dumazet
@ 2009-11-10 18:53       ` Stephen Hemminger
  0 siblings, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 18:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On Tue, 10 Nov 2009 19:39:20 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger a écrit :
> > I was just trying to pick up the stragglers you left behind :-)
> > 
> > The nasty cases left are bonding (whose existing locking model is a pile
> > of crap), and sysfs (slightly less stinky). The bonding code just needs to
> > be rewritten to have a sane/simple model. 
> > 
> 
> I was thinking of improving bonding, so it might be good to coordinate our work :)

I'll be happy to review, but was avoiding doing any serious work on it.

> 
> 1) Get rid of rwlock in tx fast path. (RCU ? did I said RCU)
> 
> 2) multi queue support (Got two dual 82599 dual ports cards from Intel :) )

I hope to get back to doing netlink interface for bridging, then bonding
over next few weeks.

-- 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/10] decnet: use RCU to find network devices
  2009-11-10 18:50     ` Stephen Hemminger
  2009-11-10 18:24       ` steve
@ 2009-11-10 19:25       ` Eric Dumazet
  1 sibling, 0 replies; 61+ messages in thread
From: Eric Dumazet @ 2009-11-10 19:25 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Christine Caulfield, Hannes Eder, Alexey Dobriyan,
	Steven Whitehouse, netdev, linux-decnet-users

Stephen Hemminger a écrit :
> On Tue, 10 Nov 2009 19:43:21 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
>> Stephen Hemminger a écrit :
>>> When showing device statistics use RCU rather than read_lock(&dev_base_lock)
>>> Compile tested only.
>>>
>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>>
>>> --- a/net/decnet/dn_dev.c	2009-11-10 09:30:55.557376454 -0800
>>> +++ b/net/decnet/dn_dev.c	2009-11-10 09:40:03.847005394 -0800
>>> @@ -856,9 +856,7 @@ int dn_dev_bind_default(__le16 *addr)
>>>  	dev = dn_dev_get_default();
>>>  last_chance:
>>>  	if (dev) {
>>> -		read_lock(&dev_base_lock);
>>>  		rv = dn_dev_get_first(dev, addr);
>>> -		read_unlock(&dev_base_lock);
>>>  		dev_put(dev);
>>>  		if (rv == 0 || dev == init_net.loopback_dev)
>>>  			return rv;
>>> @@ -1323,18 +1321,18 @@ static inline int is_dn_dev(struct net_d
>>
>> I dont understand this part. Why previous locking can be avoided ?
> 
> dn_dev_get_default acquires a reference on dev so the device can
> not go away.
> 
> It could be the original author meant to ensure the address list
> doesn't change. If so, then rtnl_lock() should have been used.

Hmm... I spot some bugs during my previous round of patches, so maybe
this is a real bug... We should double check.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/10] netdev: add netdev_continue_rcu
  2009-11-10 17:54 ` [PATCH 01/10] netdev: add netdev_continue_rcu Stephen Hemminger
  2009-11-10 18:19   ` Eric Dumazet
@ 2009-11-10 19:39   ` Paul E. McKenney
  1 sibling, 0 replies; 61+ messages in thread
From: Paul E. McKenney @ 2009-11-10 19:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev

On Tue, Nov 10, 2009 at 09:54:47AM -0800, Stephen Hemminger wrote:
> This adds an RCU macro for continuing search, useful for some
> network devices like vlan.

Looks good!!!

Of course, you need to either have a single RCU read-side critical section
cover all the chained list_for_each_entry_continue_rcu() invocations, or
you need to do something (e.g., reference count) to make sure that the
element in question doesn't disappear in the meantime, right?

							Thanx, Paul

> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> --- a/include/linux/netdevice.h	2009-11-09 22:19:08.511480873 -0800
> +++ b/include/linux/netdevice.h	2009-11-10 09:27:17.973376267 -0800
> @@ -1079,6 +1079,8 @@ extern rwlock_t				dev_base_lock;		/* De
>  		list_for_each_entry_safe(d, n, &(net)->dev_base_head, dev_list)
>  #define for_each_netdev_continue(net, d)		\
>  		list_for_each_entry_continue(d, &(net)->dev_base_head, dev_list)
> +#define for_each_netdev_continue_rcu(net, d)		\
> +	list_for_each_entry_continue_rcu(d, &(net)->dev_base_head, dev_list)
>  #define net_device_entry(lh)	list_entry(lh, struct net_device, dev_list)
> 
>  static inline struct net_device *next_net_device(struct net_device *dev)
> --- a/include/linux/rculist.h	2009-11-09 22:19:08.529480859 -0800
> +++ b/include/linux/rculist.h	2009-11-10 09:27:17.974376609 -0800
> @@ -262,6 +262,20 @@ static inline void list_splice_init_rcu(
>  		(pos) = rcu_dereference((pos)->next))
> 
>  /**
> + * list_for_each_entry_continue_rcu - continue iteration over list of given type
> + * @pos:	the type * to use as a loop cursor.
> + * @head:	the head for your list.
> + * @member:	the name of the list_struct within the struct.
> + *
> + * Continue to iterate over list of given type, continuing after
> + * the current position.
> + */
> +#define list_for_each_entry_continue_rcu(pos, head, member) 		\
> +	for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \
> +	     prefetch(pos->member.next), &pos->member != (head);	\
> +	     pos = list_entry_rcu(pos->member.next, typeof(*pos), member))
> +
> +/**
>   * hlist_del_rcu - deletes entry from hash list without re-initialization
>   * @n: the element to delete from the hash list.
>   *
> 
> -- 
> 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 17:54 ` [PATCH 04/10] AOE: use rcu to find network device Stephen Hemminger
  2009-11-10 18:23   ` Eric Dumazet
@ 2009-11-10 20:01   ` Ed Cashin
  2009-11-10 23:06     ` Stephen Hemminger
  1 sibling, 1 reply; 61+ messages in thread
From: Ed Cashin @ 2009-11-10 20:01 UTC (permalink / raw)
  To: shemminger, davem, ecashin, harvey.harrison, bzolnier, netdev

On Tue Nov 10 13:07:37 EST 2009, shemminger@vyatta.com wrote:
> This gets rid of another use of read_lock(&dev_base_lock) by using
> RCU. Also, it only increments the reference count of the device actually
> used rather than holding and releasing every device
> 
> Compile tested only.

This function runs once a minute when the aoe driver is loaded,
if you'd like to test it a bit more.

It looks like there's no dev_put corresponding to the dev_hold
after the changes.

-- 
  Ed

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 20:01   ` Ed Cashin
@ 2009-11-10 23:06     ` Stephen Hemminger
  2009-11-10 23:53       ` Stephen Hemminger
  2009-11-11 14:22       ` Ed Cashin
  0 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 23:06 UTC (permalink / raw)
  To: Ed Cashin; +Cc: davem, ecashin, harvey.harrison, bzolnier, netdev

On Tue, 10 Nov 2009 15:01:49 -0500
Ed Cashin <ecashin@coraid.com> wrote:

> On Tue Nov 10 13:07:37 EST 2009, shemminger@vyatta.com wrote:
> > This gets rid of another use of read_lock(&dev_base_lock) by using
> > RCU. Also, it only increments the reference count of the device actually
> > used rather than holding and releasing every device
> > 
> > Compile tested only.
> 
> This function runs once a minute when the aoe driver is loaded,
> if you'd like to test it a bit more.
> 
> It looks like there's no dev_put corresponding to the dev_hold
> after the changes.
> 

Hmm, looks like AOE actually is not ref counting the network device.
So my patch is incorrect. 

As it stands (before my patch), it is UNSAFE. It can decide to queue
packets to a device that is removed out from underneath it causing
reference to freed memory.

Moving the rcu_read_lock up to aoecmd_cfg() would solve that but the
whole driver appears to be unsafe about device refcounting and handling
device removal properly.  

It needs to:

1. Get a device ref count when it remembers a device: (ie addif)
2. Install a notifier that looks for device removal events
3. In notifier, remove interface, including flushing all pending
   skb's for that device.

This obviously is beyond the scope of the RCU stuff.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 23:06     ` Stephen Hemminger
@ 2009-11-10 23:53       ` Stephen Hemminger
  2009-11-11  6:48         ` David Miller
                           ` (2 more replies)
  2009-11-11 14:22       ` Ed Cashin
  1 sibling, 3 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-10 23:53 UTC (permalink / raw)
  To: karaluh, Ed Cashin; +Cc: roel.kluin, harvey.harrison, bzolnier, netdev

On Tue, 10 Nov 2009 15:06:17 -0800
Stephen Hemminger <shemminger@vyatta.com> wrote:

> On Tue, 10 Nov 2009 15:01:49 -0500
> Ed Cashin <ecashin@coraid.com> wrote:
> 
> > On Tue Nov 10 13:07:37 EST 2009, shemminger@vyatta.com wrote:
> > > This gets rid of another use of read_lock(&dev_base_lock) by using
> > > RCU. Also, it only increments the reference count of the device actually
> > > used rather than holding and releasing every device
> > > 
> > > Compile tested only.
> > 
> > This function runs once a minute when the aoe driver is loaded,
> > if you'd like to test it a bit more.
> > 
> > It looks like there's no dev_put corresponding to the dev_hold
> > after the changes.
> > 
> 
> Hmm, looks like AOE actually is not ref counting the network device.
> So my patch is incorrect. 
> 
> As it stands (before my patch), it is UNSAFE. It can decide to queue
> packets to a device that is removed out from underneath it causing
> reference to freed memory.
> 
> Moving the rcu_read_lock up to aoecmd_cfg() would solve that but the
> whole driver appears to be unsafe about device refcounting and handling
> device removal properly.  
> 
> It needs to:
> 
> 1. Get a device ref count when it remembers a device: (ie addif)
> 2. Install a notifier that looks for device removal events
> 3. In notifier, remove interface, including flushing all pending
>    skb's for that device.
> 
> This obviously is beyond the scope of the RCU stuff.

Here is a patch to get you going, it does compile but it probably
won't work because the code doesn't handle the case of the last
device going away from a target. This is yet another pre-existing
bug, since if a timeout happens: ejectif() is called to remove a device,
resend() will BUG in ifrotate() if all devices are gone.


---
 drivers/block/aoe/aoe.h     |    2 ++
 drivers/block/aoe/aoecmd.c  |   19 +++++++++++++++++++
 drivers/block/aoe/aoedev.c  |   14 ++++++++++++++
 drivers/block/aoe/aoemain.c |   24 ++++++++++++++++++++++++
 4 files changed, 59 insertions(+)

--- a/drivers/block/aoe/aoecmd.c	2009-11-10 15:13:25.673859220 -0800
+++ b/drivers/block/aoe/aoecmd.c	2009-11-10 15:49:20.009047132 -0800
@@ -413,6 +413,8 @@ addif(struct aoetgt *t, struct net_devic
 	p = getif(t, NULL);
 	if (!p)
 		return NULL;
+
+	dev_hold(nd);
 	p->nd = nd;
 	p->maxbcnt = DEFAULTBCNT;
 	p->lost = 0;
@@ -424,12 +426,29 @@ static void
 ejectif(struct aoetgt *t, struct aoeif *ifp)
 {
 	struct aoeif *e;
+	struct net_device *nd;
 	ulong n;
 
 	e = t->ifs + NAOEIFS - 1;
+	nd = e->nd;
 	n = (e - ifp) * sizeof *ifp;
 	memmove(ifp, ifp+1, n);
 	e->nd = NULL;
+	dev_put(nd);
+}
+
+void aoecmd_flushnet(struct aoedev *d, struct net_device *nd)
+{
+	struct aoetgt **tt, **te;
+	tt = d->targets;
+	te = tt + NTARGETS;
+	for (; tt < te && *tt; tt++) {
+		struct aoetgt *t = *tt;
+		struct aoeif *ifp;
+
+		while ( (ifp = getif(t, nd)) )
+			ejectif(t, ifp);
+	}
 }
 
 static int
--- a/drivers/block/aoe/aoemain.c	2009-11-10 15:13:25.696859195 -0800
+++ b/drivers/block/aoe/aoemain.c	2009-11-10 15:48:43.352047188 -0800
@@ -8,6 +8,8 @@
 #include <linux/blkdev.h>
 #include <linux/module.h>
 #include <linux/skbuff.h>
+#include <linux/notifier.h>
+#include <linux/netdevice.h>
 #include "aoe.h"
 
 MODULE_LICENSE("GPL");
@@ -54,11 +56,28 @@ discover_timer(ulong vp)
 	}
 }
 
+/* Callback on change of state of network device. */
+static int aoe_device_event(struct notifier_block *unused,
+			    unsigned long event, void *ptr)
+{
+	struct net_device *nd = ptr;
+
+	if (is_aoe_netif(nd) && event == NETDEV_UNREGISTER)
+		aoedev_ejectnet(nd);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block aoe_notifier = {
+	.notifier_call = aoe_device_event,
+};
+
 static void
 aoe_exit(void)
 {
 	discover_timer(TKILL);
 
+	unregister_netdevice_notifier(&aoe_notifier);
 	aoenet_exit();
 	unregister_blkdev(AOE_MAJOR, DEVICE_NAME);
 	aoechr_exit();
@@ -83,6 +102,9 @@ aoe_init(void)
 	ret = aoenet_init();
 	if (ret)
 		goto net_fail;
+	ret = register_netdevice_notifier(&aoe_notifier);
+	if (ret)
+		goto notifier_fail;
 	ret = register_blkdev(AOE_MAJOR, DEVICE_NAME);
 	if (ret < 0) {
 		printk(KERN_ERR "aoe: can't register major\n");
@@ -94,6 +116,8 @@ aoe_init(void)
 	return 0;
 
  blkreg_fail:
+	unregister_netdevice_notifier(&aoe_notifier);
+ notifier_fail:
 	aoenet_exit();
  net_fail:
 	aoeblk_exit();
--- a/drivers/block/aoe/aoe.h	2009-11-10 15:36:07.775921768 -0800
+++ b/drivers/block/aoe/aoe.h	2009-11-10 15:43:14.972984754 -0800
@@ -186,6 +186,7 @@ void aoecmd_ata_rsp(struct sk_buff *);
 void aoecmd_cfg_rsp(struct sk_buff *);
 void aoecmd_sleepwork(struct work_struct *);
 void aoecmd_cleanslate(struct aoedev *);
+void aoecmd_flushnet(struct aoedev *, struct net_device *);
 struct sk_buff *aoecmd_ata_id(struct aoedev *);
 
 int aoedev_init(void);
@@ -194,6 +195,7 @@ struct aoedev *aoedev_by_aoeaddr(int maj
 struct aoedev *aoedev_by_sysminor_m(ulong sysminor);
 void aoedev_downdev(struct aoedev *d);
 int aoedev_flush(const char __user *str, size_t size);
+void aoedev_ejectnet(struct net_device *);
 
 int aoenet_init(void);
 void aoenet_exit(void);
--- a/drivers/block/aoe/aoedev.c	2009-11-10 15:13:25.685859893 -0800
+++ b/drivers/block/aoe/aoedev.c	2009-11-10 15:46:19.430861404 -0800
@@ -162,6 +162,20 @@ aoedev_flush(const char __user *str, siz
 	return 0;
 }
 
+void aoedev_ejectnet(struct net_device *nd)
+{
+	struct aoedev *d;
+	unsigned long flags;
+
+	spin_lock_irqsave(&devlist_lock, flags);
+	for (d = devlist; d; d = d->next) {
+		spin_lock(&d->lock);
+		aoecmd_flushnet(d, nd);
+		spin_unlock(&d->lock);
+	}
+	spin_unlock_irqrestore(&d->lock, flags);
+}
+
 /* I'm not really sure that this is a realistic problem, but if the
 network driver goes gonzo let's just leak memory after complaining. */
 static void

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/10] CAN: use dev_get_by_index_rcu
  2009-11-10 18:34   ` Eric Dumazet
@ 2009-11-11  5:54     ` Oliver Hartkopp
  2009-11-11  6:50       ` David Miller
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Hartkopp @ 2009-11-11  5:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Stephen Hemminger, David Miller, netdev

Eric Dumazet wrote:
> Stephen Hemminger a écrit :
>> Use new function to avoid doing read_lock().
>>
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>
>> --- a/net/can/bcm.c	2009-11-10 09:45:16.301376272 -0800
>> +++ b/net/can/bcm.c	2009-11-10 09:46:30.125005956 -0800
>> @@ -139,13 +139,13 @@ static char *bcm_proc_getifname(char *re
>>  	if (!ifindex)
>>  		return "any";
>>  
>> -	read_lock(&dev_base_lock);
>> -	dev = __dev_get_by_index(&init_net, ifindex);
>> +	rcu_read_lock();
>> +	dev = dev_get_by_index_rcu(&init_net, ifindex);
>>  	if (dev)
>>  		strcpy(result, dev->name);
>>  	else
>>  		strcpy(result, "???");
>> -	read_unlock(&dev_base_lock);
>> +	rcu_read_unlock();
>>  
>>  	return result;
>>  }
>>
> 
> I was pretty sure I had already done this one...
> 
> Ah yes, that was planned after a bugfix for net-2.6.
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
> 

Thanks everyone!

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 01/10] netdev: add netdev_continue_rcu
  2009-11-10 18:19   ` Eric Dumazet
@ 2009-11-11  6:47     ` David Miller
  0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:47 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, paulmck, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 10 Nov 2009 19:19:22 +0100

> Stephen Hemminger a écrit :
>> This adds an RCU macro for continuing search, useful for some
>> network devices like vlan.
>> 
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>> 
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 02/10] vlan: eliminate use of dev_base_lock
  2009-11-10 18:20   ` Eric Dumazet
@ 2009-11-11  6:47     ` David Miller
  0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:47 UTC (permalink / raw)
  To: dada1; +Cc: shemminger, kaber, netdev

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Tue, 10 Nov 2009 19:20:15 +0100

> Stephen Hemminger a écrit :
>> Do not need to use read_lock(&dev_base_lock), use RCU instead.
>> 
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 03/10] net: use rcu for network scheduler API
  2009-11-10 18:20   ` Eric Dumazet
@ 2009-11-11  6:47     ` David Miller
  0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:47 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 10 Nov 2009 19:20:57 +0100

> Stephen Hemminger a écrit :
>> Use RCU to walk list of network devices in qdisc dump.
>> This could be optimized for large number of devices.
>> 
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>> 
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 23:53       ` Stephen Hemminger
@ 2009-11-11  6:48         ` David Miller
  2009-11-12 14:33         ` Ed Cashin
  2009-11-18 16:49         ` Ed Cashin
  2 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:48 UTC (permalink / raw)
  To: shemminger
  Cc: karaluh, ecashin, roel.kluin, harvey.harrison, bzolnier, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 10 Nov 2009 15:53:16 -0800

> On Tue, 10 Nov 2009 15:06:17 -0800
> Stephen Hemminger <shemminger@vyatta.com> wrote:
> 
>> On Tue, 10 Nov 2009 15:01:49 -0500
>> Ed Cashin <ecashin@coraid.com> wrote:
>> 
>> > On Tue Nov 10 13:07:37 EST 2009, shemminger@vyatta.com wrote:
>> > > This gets rid of another use of read_lock(&dev_base_lock) by using
>> > > RCU. Also, it only increments the reference count of the device actually
>> > > used rather than holding and releasing every device
>> > > 
>> > > Compile tested only.
>> > 
>> > This function runs once a minute when the aoe driver is loaded,
>> > if you'd like to test it a bit more.
>> > 
>> > It looks like there's no dev_put corresponding to the dev_hold
>> > after the changes.
>> > 
>> 
>> Hmm, looks like AOE actually is not ref counting the network device.
>> So my patch is incorrect. 
>> 
>> As it stands (before my patch), it is UNSAFE. It can decide to queue
>> packets to a device that is removed out from underneath it causing
>> reference to freed memory.
>> 
>> Moving the rcu_read_lock up to aoecmd_cfg() would solve that but the
>> whole driver appears to be unsafe about device refcounting and handling
>> device removal properly.  
>> 
>> It needs to:
>> 
>> 1. Get a device ref count when it remembers a device: (ie addif)
>> 2. Install a notifier that looks for device removal events
>> 3. In notifier, remove interface, including flushing all pending
>>    skb's for that device.
>> 
>> This obviously is beyond the scope of the RCU stuff.
> 
> Here is a patch to get you going, it does compile but it probably
> won't work because the code doesn't handle the case of the last
> device going away from a target. This is yet another pre-existing
> bug, since if a timeout happens: ejectif() is called to remove a device,
> resend() will BUG in ifrotate() if all devices are gone.

I'm holding off on the RCU patch until these known refcount bugs are
fixed

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 05/10] parisc: use RCU to find network device
  2009-11-10 17:54   ` Stephen Hemminger
  (?)
  (?)
@ 2009-11-11  6:48   ` David Miller
  -1 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:48 UTC (permalink / raw)
  To: shemminger; +Cc: kyle, deller, a.beregalov, netdev, linux-parisc

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 10 Nov 2009 09:54:51 -0800

> Another place where RCU can be used instead of read_lock(&dev_base_lock)
> This is by inspection, don't have platform or cross-build environment
> to validate.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 06/10] s390: use RCU to walk list of network devices
  2009-11-10 18:40   ` Eric Dumazet
@ 2009-11-11  6:49     ` David Miller
  0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:49 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, schwidefsky, heiko.carstens, netdev, linux390

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 10 Nov 2009 19:40:42 +0100

> Stephen Hemminger a écrit :
>> This is similar to other cases where for_each_netdev_rcu
>> can be used when gathering information.
>> 
>> By inspection, don't have platform or cross-build environment
>> to validate.
>> 
>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>> 
>> 
> 
> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 07/10] decnet: use RCU to find network devices
  2009-11-10 17:54 ` [PATCH 07/10] decnet: use RCU to find " Stephen Hemminger
  2009-11-10 18:43   ` Eric Dumazet
@ 2009-11-11  6:49   ` David Miller
  1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:49 UTC (permalink / raw)
  To: shemminger
  Cc: christine.caulfield, hannes, adobriyan, swhiteho, netdev,
	linux-decnet-users

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 10 Nov 2009 09:54:53 -0800

> When showing device statistics use RCU rather than read_lock(&dev_base_lock)
> Compile tested only.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 08/10] ipv6: use RCU to walk list of network devices
  2009-11-10 17:54 ` [PATCH 08/10] ipv6: use RCU to walk list of " Stephen Hemminger
@ 2009-11-11  6:50   ` David Miller
  2009-11-12  3:34   ` [PATCH net-next-2.6] " Eric Dumazet
  1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:50 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 10 Nov 2009 09:54:54 -0800

> No longer need read_lock(&dev_base_lock), use RCU instead.
> This also needs to be optimized for large number of devices.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

As Eric mentioned, superceded by his patch.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 09/10] IPV4: use rcu to walk list of devices in IGMP
  2009-11-10 17:54 ` [PATCH 09/10] IPV4: use rcu to walk list of devices in IGMP Stephen Hemminger
  2009-11-10 18:47   ` Eric Dumazet
@ 2009-11-11  6:50   ` David Miller
  1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:50 UTC (permalink / raw)
  To: shemminger; +Cc: netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 10 Nov 2009 09:54:55 -0800

> This also needs to be optimized for large number of devices.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 10/10] CAN: use dev_get_by_index_rcu
  2009-11-11  5:54     ` Oliver Hartkopp
@ 2009-11-11  6:50       ` David Miller
  0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-11  6:50 UTC (permalink / raw)
  To: oliver; +Cc: eric.dumazet, shemminger, netdev

From: Oliver Hartkopp <oliver@hartkopp.net>
Date: Wed, 11 Nov 2009 06:54:24 +0100

> Eric Dumazet wrote:
>> Stephen Hemminger a écrit :
>>> Use new function to avoid doing read_lock().
>>>
>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
...
>> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
>> 
> 
> Thanks everyone!
> 
> Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 23:06     ` Stephen Hemminger
  2009-11-10 23:53       ` Stephen Hemminger
@ 2009-11-11 14:22       ` Ed Cashin
  2009-11-13 21:39         ` Ed Cashin
  1 sibling, 1 reply; 61+ messages in thread
From: Ed Cashin @ 2009-11-11 14:22 UTC (permalink / raw)
  To: shemminger, ecashin, davem, harvey.harrison, bzolnier, netdev

On Tue Nov 10 18:06:41 EST 2009, shemminger@vyatta.com wrote:
...
> Hmm, looks like AOE actually is not ref counting the network device.
> So my patch is incorrect. 
> 
> As it stands (before my patch), it is UNSAFE. It can decide to queue
> packets to a device that is removed out from underneath it causing
> reference to freed memory.

Yes, that's right.  I will look at the patch you sent in the follow
up.  Thanks.

-- 
  Ed

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] decnet: add RTNL lock when reading address list
  2009-11-10 18:24       ` steve
@ 2009-11-11 17:39         ` Stephen Hemminger
  2009-11-11 17:40           ` [PATCH 2/2] decnet: convert dndev_lock to spinlock Stephen Hemminger
  2009-11-12  3:56           ` [PATCH 1/2] decnet: add RTNL lock when reading address list David Miller
  0 siblings, 2 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-11 17:39 UTC (permalink / raw)
  To: steve, David Miller
  Cc: Eric Dumazet, Christine Caulfield, Hannes Eder, Alexey Dobriyan,
	Steven Whitehouse, netdev, linux-decnet-users

Add missing locking in the case of auto binding to the
default device. The address list might change while this code is looking
at the list. 

Compile tested only, I am not a decnet user.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/decnet/dn_dev.c	2009-11-11 09:18:02.935313559 -0800
+++ b/net/decnet/dn_dev.c	2009-11-11 09:22:32.157313924 -0800
@@ -828,13 +828,17 @@ static int dn_dev_get_first(struct net_d
 	struct dn_dev *dn_db = (struct dn_dev *)dev->dn_ptr;
 	struct dn_ifaddr *ifa;
 	int rv = -ENODEV;
+
 	if (dn_db == NULL)
 		goto out;
+
+	rtnl_lock();
 	ifa = dn_db->ifa_list;
 	if (ifa != NULL) {
 		*addr = ifa->ifa_local;
 		rv = 0;
 	}
+	rtnl_unlock();
 out:
 	return rv;
 }

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/2] decnet: convert dndev_lock to spinlock
  2009-11-11 17:39         ` [PATCH 1/2] decnet: add RTNL lock when reading address list Stephen Hemminger
@ 2009-11-11 17:40           ` Stephen Hemminger
  2009-11-12  3:56             ` David Miller
  2009-11-12  3:56           ` [PATCH 1/2] decnet: add RTNL lock when reading address list David Miller
  1 sibling, 1 reply; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-11 17:40 UTC (permalink / raw)
  To: steve, David Miller
  Cc: Eric Dumazet, Christine Caulfield, Hannes Eder, Alexey Dobriyan,
	Steven Whitehouse, netdev, linux-decnet-users

There is no reason for this lock to be reader/writer since
the reader only has lock held for a very brief period.
The overhead of read_lock is more expensive than spinlock.

Compile tested only, I am not a decnet user.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/decnet/dn_dev.c	2009-11-10 17:39:53.652984752 -0800
+++ b/net/decnet/dn_dev.c	2009-11-10 17:41:15.942736073 -0800
@@ -68,7 +68,7 @@ extern struct neigh_table dn_neigh_table
  */
 __le16 decnet_address = 0;
 
-static DEFINE_RWLOCK(dndev_lock);
+static DEFINE_SPINLOCK(dndev_lock);
 static struct net_device *decnet_default_device;
 static BLOCKING_NOTIFIER_HEAD(dnaddr_chain);
 
@@ -557,7 +557,8 @@ rarok:
 struct net_device *dn_dev_get_default(void)
 {
 	struct net_device *dev;
-	read_lock(&dndev_lock);
+
+	spin_lock(&dndev_lock);
 	dev = decnet_default_device;
 	if (dev) {
 		if (dev->dn_ptr)
@@ -565,7 +566,8 @@ struct net_device *dn_dev_get_default(vo
 		else
 			dev = NULL;
 	}
-	read_unlock(&dndev_lock);
+	spin_unlock(&dndev_lock);
+
 	return dev;
 }
 
@@ -575,13 +577,15 @@ int dn_dev_set_default(struct net_device
 	int rv = -EBUSY;
 	if (!dev->dn_ptr)
 		return -ENODEV;
-	write_lock(&dndev_lock);
+
+	spin_lock(&dndev_lock);
 	if (force || decnet_default_device == NULL) {
 		old = decnet_default_device;
 		decnet_default_device = dev;
 		rv = 0;
 	}
-	write_unlock(&dndev_lock);
+	spin_unlock(&dndev_lock);
+
 	if (old)
 		dev_put(old);
 	return rv;
@@ -589,13 +593,14 @@ int dn_dev_set_default(struct net_device
 
 static void dn_dev_check_default(struct net_device *dev)
 {
-	write_lock(&dndev_lock);
+	spin_lock(&dndev_lock);
 	if (dev == decnet_default_device) {
 		decnet_default_device = NULL;
 	} else {
 		dev = NULL;
 	}
-	write_unlock(&dndev_lock);
+	spin_unlock(&dndev_lock);
+
 	if (dev)
 		dev_put(dev);
 }


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH net-next-2.6] ipv6: use RCU to walk list of network devices
  2009-11-10 17:54 ` [PATCH 08/10] ipv6: use RCU to walk list of " Stephen Hemminger
  2009-11-11  6:50   ` David Miller
@ 2009-11-12  3:34   ` Eric Dumazet
  2009-11-14  4:39     ` David Miller
  1 sibling, 1 reply; 61+ messages in thread
From: Eric Dumazet @ 2009-11-12  3:34 UTC (permalink / raw)
  To: David Miller; +Cc: Stephen Hemminger, netdev

Stephen Hemminger a écrit :
> No longer need read_lock(&dev_base_lock), use RCU instead.
> This also needs to be optimized for large number of devices.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

I gave a look at your patch Stephen and found we need a new
next_net_device_rcu(struct net_device *dev) as well,
as next_net_device() is not RCU safe.

(followup patch is probably needed to use it in net/ipv4/igmp.c,
after commit 61fbab77a843d2e77232 : IPV4: use rcu to walk list of devices in IGMP)

We also can avoid taking references on inet6_dev structs.

[PATCH net-next-2.6] ipv6: use RCU to walk list of network devices

No longer need read_lock(&dev_base_lock), use RCU instead.
We also can avoid taking references on inet6_dev structs.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/linux/netdevice.h |   10 +++++++
 net/ipv6/anycast.c        |   29 +++++++++-----------
 net/ipv6/mcast.c          |   51 ++++++++++++++++--------------------
 3 files changed, 47 insertions(+), 43 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 083b598..2734a67 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1093,6 +1093,16 @@ static inline struct net_device *next_net_device(struct net_device *dev)
 	return lh == &net->dev_base_head ? NULL : net_device_entry(lh);
 }
 
+static inline struct net_device *next_net_device_rcu(struct net_device *dev)
+{
+	struct list_head *lh;
+	struct net *net;
+
+	net = dev_net(dev);
+	lh = rcu_dereference(dev->dev_list.next);
+	return lh == &net->dev_base_head ? NULL : net_device_entry(lh);
+}
+
 static inline struct net_device *first_net_device(struct net *net)
 {
 	return list_empty(&net->dev_base_head) ? NULL :
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index 2f00ca8..f1c74c8 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -431,9 +431,9 @@ static inline struct ifacaddr6 *ac6_get_first(struct seq_file *seq)
 	struct net *net = seq_file_net(seq);
 
 	state->idev = NULL;
-	for_each_netdev(net, state->dev) {
+	for_each_netdev_rcu(net, state->dev) {
 		struct inet6_dev *idev;
-		idev = in6_dev_get(state->dev);
+		idev = __in6_dev_get(state->dev);
 		if (!idev)
 			continue;
 		read_lock_bh(&idev->lock);
@@ -443,7 +443,6 @@ static inline struct ifacaddr6 *ac6_get_first(struct seq_file *seq)
 			break;
 		}
 		read_unlock_bh(&idev->lock);
-		in6_dev_put(idev);
 	}
 	return im;
 }
@@ -454,16 +453,15 @@ static struct ifacaddr6 *ac6_get_next(struct seq_file *seq, struct ifacaddr6 *im
 
 	im = im->aca_next;
 	while (!im) {
-		if (likely(state->idev != NULL)) {
+		if (likely(state->idev != NULL))
 			read_unlock_bh(&state->idev->lock);
-			in6_dev_put(state->idev);
-		}
-		state->dev = next_net_device(state->dev);
+
+		state->dev = next_net_device_rcu(state->dev);
 		if (!state->dev) {
 			state->idev = NULL;
 			break;
 		}
-		state->idev = in6_dev_get(state->dev);
+		state->idev = __in6_dev_get(state->dev);
 		if (!state->idev)
 			continue;
 		read_lock_bh(&state->idev->lock);
@@ -482,29 +480,30 @@ static struct ifacaddr6 *ac6_get_idx(struct seq_file *seq, loff_t pos)
 }
 
 static void *ac6_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(dev_base_lock)
+	__acquires(RCU)
 {
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	return ac6_get_idx(seq, *pos);
 }
 
 static void *ac6_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-	struct ifacaddr6 *im;
-	im = ac6_get_next(seq, v);
+	struct ifacaddr6 *im = ac6_get_next(seq, v);
+
 	++*pos;
 	return im;
 }
 
 static void ac6_seq_stop(struct seq_file *seq, void *v)
-	__releases(dev_base_lock)
+	__releases(RCU)
 {
 	struct ac6_iter_state *state = ac6_seq_private(seq);
+
 	if (likely(state->idev != NULL)) {
 		read_unlock_bh(&state->idev->lock);
-		in6_dev_put(state->idev);
+		state->idev = NULL;
 	}
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int ac6_seq_show(struct seq_file *seq, void *v)
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index f9fcf69..1f9c444 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -2375,9 +2375,9 @@ static inline struct ifmcaddr6 *igmp6_mc_get_first(struct seq_file *seq)
 	struct net *net = seq_file_net(seq);
 
 	state->idev = NULL;
-	for_each_netdev(net, state->dev) {
+	for_each_netdev_rcu(net, state->dev) {
 		struct inet6_dev *idev;
-		idev = in6_dev_get(state->dev);
+		idev = __in6_dev_get(state->dev);
 		if (!idev)
 			continue;
 		read_lock_bh(&idev->lock);
@@ -2387,7 +2387,6 @@ static inline struct ifmcaddr6 *igmp6_mc_get_first(struct seq_file *seq)
 			break;
 		}
 		read_unlock_bh(&idev->lock);
-		in6_dev_put(idev);
 	}
 	return im;
 }
@@ -2398,16 +2397,15 @@ static struct ifmcaddr6 *igmp6_mc_get_next(struct seq_file *seq, struct ifmcaddr
 
 	im = im->next;
 	while (!im) {
-		if (likely(state->idev != NULL)) {
+		if (likely(state->idev != NULL))
 			read_unlock_bh(&state->idev->lock);
-			in6_dev_put(state->idev);
-		}
-		state->dev = next_net_device(state->dev);
+
+		state->dev = next_net_device_rcu(state->dev);
 		if (!state->dev) {
 			state->idev = NULL;
 			break;
 		}
-		state->idev = in6_dev_get(state->dev);
+		state->idev = __in6_dev_get(state->dev);
 		if (!state->idev)
 			continue;
 		read_lock_bh(&state->idev->lock);
@@ -2426,31 +2424,31 @@ static struct ifmcaddr6 *igmp6_mc_get_idx(struct seq_file *seq, loff_t pos)
 }
 
 static void *igmp6_mc_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(dev_base_lock)
+	__acquires(RCU)
 {
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	return igmp6_mc_get_idx(seq, *pos);
 }
 
 static void *igmp6_mc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-	struct ifmcaddr6 *im;
-	im = igmp6_mc_get_next(seq, v);
+	struct ifmcaddr6 *im = igmp6_mc_get_next(seq, v);
+
 	++*pos;
 	return im;
 }
 
 static void igmp6_mc_seq_stop(struct seq_file *seq, void *v)
-	__releases(dev_base_lock)
+	__releases(RCU)
 {
 	struct igmp6_mc_iter_state *state = igmp6_mc_seq_private(seq);
+
 	if (likely(state->idev != NULL)) {
 		read_unlock_bh(&state->idev->lock);
-		in6_dev_put(state->idev);
 		state->idev = NULL;
 	}
 	state->dev = NULL;
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int igmp6_mc_seq_show(struct seq_file *seq, void *v)
@@ -2507,9 +2505,9 @@ static inline struct ip6_sf_list *igmp6_mcf_get_first(struct seq_file *seq)
 
 	state->idev = NULL;
 	state->im = NULL;
-	for_each_netdev(net, state->dev) {
+	for_each_netdev_rcu(net, state->dev) {
 		struct inet6_dev *idev;
-		idev = in6_dev_get(state->dev);
+		idev = __in6_dev_get(state->dev);
 		if (unlikely(idev == NULL))
 			continue;
 		read_lock_bh(&idev->lock);
@@ -2525,7 +2523,6 @@ static inline struct ip6_sf_list *igmp6_mcf_get_first(struct seq_file *seq)
 			spin_unlock_bh(&im->mca_lock);
 		}
 		read_unlock_bh(&idev->lock);
-		in6_dev_put(idev);
 	}
 	return psf;
 }
@@ -2539,16 +2536,15 @@ static struct ip6_sf_list *igmp6_mcf_get_next(struct seq_file *seq, struct ip6_s
 		spin_unlock_bh(&state->im->mca_lock);
 		state->im = state->im->next;
 		while (!state->im) {
-			if (likely(state->idev != NULL)) {
+			if (likely(state->idev != NULL))
 				read_unlock_bh(&state->idev->lock);
-				in6_dev_put(state->idev);
-			}
-			state->dev = next_net_device(state->dev);
+
+			state->dev = next_net_device_rcu(state->dev);
 			if (!state->dev) {
 				state->idev = NULL;
 				goto out;
 			}
-			state->idev = in6_dev_get(state->dev);
+			state->idev = __in6_dev_get(state->dev);
 			if (!state->idev)
 				continue;
 			read_lock_bh(&state->idev->lock);
@@ -2573,9 +2569,9 @@ static struct ip6_sf_list *igmp6_mcf_get_idx(struct seq_file *seq, loff_t pos)
 }
 
 static void *igmp6_mcf_seq_start(struct seq_file *seq, loff_t *pos)
-	__acquires(dev_base_lock)
+	__acquires(RCU)
 {
-	read_lock(&dev_base_lock);
+	rcu_read_lock();
 	return *pos ? igmp6_mcf_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 }
 
@@ -2591,7 +2587,7 @@ static void *igmp6_mcf_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 }
 
 static void igmp6_mcf_seq_stop(struct seq_file *seq, void *v)
-	__releases(dev_base_lock)
+	__releases(RCU)
 {
 	struct igmp6_mcf_iter_state *state = igmp6_mcf_seq_private(seq);
 	if (likely(state->im != NULL)) {
@@ -2600,11 +2596,10 @@ static void igmp6_mcf_seq_stop(struct seq_file *seq, void *v)
 	}
 	if (likely(state->idev != NULL)) {
 		read_unlock_bh(&state->idev->lock);
-		in6_dev_put(state->idev);
 		state->idev = NULL;
 	}
 	state->dev = NULL;
-	read_unlock(&dev_base_lock);
+	rcu_read_unlock();
 }
 
 static int igmp6_mcf_seq_show(struct seq_file *seq, void *v)

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] decnet: add RTNL lock when reading address list
  2009-11-11 17:39         ` [PATCH 1/2] decnet: add RTNL lock when reading address list Stephen Hemminger
  2009-11-11 17:40           ` [PATCH 2/2] decnet: convert dndev_lock to spinlock Stephen Hemminger
@ 2009-11-12  3:56           ` David Miller
  1 sibling, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-12  3:56 UTC (permalink / raw)
  To: shemminger
  Cc: steve, eric.dumazet, christine.caulfield, hannes, adobriyan,
	swhiteho, netdev, linux-decnet-users

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 11 Nov 2009 09:39:27 -0800

> Add missing locking in the case of auto binding to the
> default device. The address list might change while this code is looking
> at the list. 
> 
> Compile tested only, I am not a decnet user.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied to net-next-2.6

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] decnet: convert dndev_lock to spinlock
  2009-11-11 17:40           ` [PATCH 2/2] decnet: convert dndev_lock to spinlock Stephen Hemminger
@ 2009-11-12  3:56             ` David Miller
  0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-12  3:56 UTC (permalink / raw)
  To: shemminger
  Cc: steve, eric.dumazet, christine.caulfield, hannes, adobriyan,
	swhiteho, netdev, linux-decnet-users

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 11 Nov 2009 09:40:36 -0800

> There is no reason for this lock to be reader/writer since
> the reader only has lock held for a very brief period.
> The overhead of read_lock is more expensive than spinlock.
> 
> Compile tested only, I am not a decnet user.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied to net-next-2.6

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 23:53       ` Stephen Hemminger
  2009-11-11  6:48         ` David Miller
@ 2009-11-12 14:33         ` Ed Cashin
  2009-11-12 17:10           ` Stephen Hemminger
  2009-11-18 16:49         ` Ed Cashin
  2 siblings, 1 reply; 61+ messages in thread
From: Ed Cashin @ 2009-11-12 14:33 UTC (permalink / raw)
  To: shemminger, karaluh, ecashin, roel.kluin, harvey.harrison,
	bzolnier, netdev

Thanks again for providing this patch to help get things started.
It's very helpful.  I appreciate the way it reflects and fits into the
rest of the driver, too.

In the patch, there's a loop around getif, ejectif inside the new
aoecmd_flushnet function:

  void aoecmd_flushnet(struct aoedev *d, struct net_device *nd)
  {
  	struct aoetgt **tt, **te;
  	tt = d->targets;
  	te = tt + NTARGETS;
  	for (; tt < te && *tt; tt++) {
  		struct aoetgt *t = *tt;
  		struct aoeif *ifp;
  
  		while ( (ifp = getif(t, nd)) )
  			ejectif(t, ifp);
  	}
  }

... but an "if" seems appropriate, since duplicates are avoided when
network devices are added in aoecmd_cfg_rsp:

  	ifp = getif(t, skb->dev);
  	if (!ifp) {
  		ifp = addif(t, skb->dev);

If there's some other reason for it to be "while" in aoecmd_flushnet
that I'm missing, please let me know.

Regarding the new aoe_device_event function,

  /* Callback on change of state of network device. */
  static int aoe_device_event(struct notifier_block *unused,
  			    unsigned long event, void *ptr)
  {
  	struct net_device *nd = ptr;
  
  	if (is_aoe_netif(nd) && event == NETDEV_UNREGISTER)
  		aoedev_ejectnet(nd);
  
  	return NOTIFY_DONE;
  }

...  the is_aoe_netif function really answers the question, "Are we
allowed by the user to do AoE on this local network interface?" The
user can modify the list of allowable interfaces at runtime via sysfs,
as it's a module parameter.  So I don't think we can use is_aoe_netif
in the new aoe_device_event function.  We should eject a net_device in
this handler even if the user has decided not to use it for AoE.
Please let me know if I'm missing the point.

You said that,

  > It needs to:
  > 
  > 1. Get a device ref count when it remembers a device: (ie addif)
  > 2. Install a notifier that looks for device removal events
  > 3. In notifier, remove interface, including flushing all pending
  >    skb's for that device.

For 3, does the starter patch flush all pending skbs?  Perhaps you
could elaborate on what you had in mind?

I am trying to find the best way for the aoe driver to handle the
situation where ther are no usable local network interfaces.  It does
seem to me that the user would benefit from a notice letting them know
that they're trying to do AoE without any usable ethernet.  I'm
thinking that doing something like a printk_once would be appropriate.

(But probably not printk_once itself, because if they add a local
interface and then it goes away later, they should be told again that
something's wrong.)

-- 
  Ed Cashin <ecashin@coraid.com>
  http://www.coraid.com/
  http://noserose.net/e/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-12 14:33         ` Ed Cashin
@ 2009-11-12 17:10           ` Stephen Hemminger
  2009-11-12 18:07             ` Ed Cashin
  0 siblings, 1 reply; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-12 17:10 UTC (permalink / raw)
  To: Ed Cashin; +Cc: karaluh, ecashin, roel.kluin, harvey.harrison, bzolnier, netdev

On Thu, 12 Nov 2009 09:33:16 -0500
Ed Cashin <ecashin@coraid.com> wrote:

> Thanks again for providing this patch to help get things started.
> It's very helpful.  I appreciate the way it reflects and fits into the
> rest of the driver, too.
> 
> In the patch, there's a loop around getif, ejectif inside the new
> aoecmd_flushnet function:
> 
>   void aoecmd_flushnet(struct aoedev *d, struct net_device *nd)
>   {
>   	struct aoetgt **tt, **te;
>   	tt = d->targets;
>   	te = tt + NTARGETS;
>   	for (; tt < te && *tt; tt++) {
>   		struct aoetgt *t = *tt;
>   		struct aoeif *ifp;
>   
>   		while ( (ifp = getif(t, nd)) )
>   			ejectif(t, ifp);
>   	}
>   }
> 
> ... but an "if" seems appropriate, since duplicates are avoided when
> network devices are added in aoecmd_cfg_rsp:
> 
>   	ifp = getif(t, skb->dev);
>   	if (!ifp) {
>   		ifp = addif(t, skb->dev);

Yeah if works, wasn't sure if you could have multiples.

> 
> If there's some other reason for it to be "while" in aoecmd_flushnet
> that I'm missing, please let me know.
> 
> Regarding the new aoe_device_event function,
> 
>   /* Callback on change of state of network device. */
>   static int aoe_device_event(struct notifier_block *unused,
>   			    unsigned long event, void *ptr)
>   {
>   	struct net_device *nd = ptr;
>   
>   	if (is_aoe_netif(nd) && event == NETDEV_UNREGISTER)
>   		aoedev_ejectnet(nd);
>   
>   	return NOTIFY_DONE;
>   }
> 
> ...  the is_aoe_netif function really answers the question, "Are we
> allowed by the user to do AoE on this local network interface?" The
> user can modify the list of allowable interfaces at runtime via sysfs,
> as it's a module parameter.  So I don't think we can use is_aoe_netif
> in the new aoe_device_event function.  We should eject a net_device in
> this handler even if the user has decided not to use it for AoE.
> Please let me know if I'm missing the point.

Ok, you know the code better, I just wanted to avoid doing unnecessary work.

> You said that,
> 
>   > It needs to:
>   > 
>   > 1. Get a device ref count when it remembers a device: (ie addif)
>   > 2. Install a notifier that looks for device removal events
>   > 3. In notifier, remove interface, including flushing all pending
>   >    skb's for that device.
> 
> For 3, does the starter patch flush all pending skbs?  Perhaps you
> could elaborate on what you had in mind?

I wasn't sure if you ended up queuing skb's, but it looks like the
code queues requests not skb's.  But you may need to disable preempt
over code the work queue code that processes requests, to make sure
that device doesn't disappear while doing stuff.

> 
> I am trying to find the best way for the aoe driver to handle the
> situation where ther are no usable local network interfaces.  It does
> seem to me that the user would benefit from a notice letting them know
> that they're trying to do AoE without any usable ethernet.  I'm
> thinking that doing something like a printk_once would be appropriate.

Since it is emulating a block device, why not propgate error back
up the stack like a disk that's offline.



-- 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-12 17:10           ` Stephen Hemminger
@ 2009-11-12 18:07             ` Ed Cashin
  2009-11-12 19:09               ` Stephen Hemminger
  0 siblings, 1 reply; 61+ messages in thread
From: Ed Cashin @ 2009-11-12 18:07 UTC (permalink / raw)
  To: shemminger, ecashin, karaluh, roel.kluin, harvey.harrison,
	bzolnier, netdev

Thanks for the responses.

On Thu Nov 12 12:11:12 EST 2009, shemminger@vyatta.com wrote:
...
> Since it is emulating a block device, why not propgate error back
> up the stack like a disk that's offline.

The lack of local interfaces to use for AoE might be temporary.  For
example, the admin might be loading a new network driver, or a new
link might go online through which the AoE target can be reached.

If AoE command packets are not sent or resent but instead are
effectively dropped while there are no local interfaces through which
to send, then the AoE commands will timeout through the normal
mechanisms, at which time the error will be signaled to the block
layer.  It will be like normal unreliable ethernet transport.

In the common case, I think that's going to be the expected behavior,
but a printk would probably still be helpful in case the admin doesn't
realize why the AoE device is timing out.

-- 
  Ed Cashin <ecashin@coraid.com>
  http://www.coraid.com/
  http://noserose.net/e/

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-12 18:07             ` Ed Cashin
@ 2009-11-12 19:09               ` Stephen Hemminger
  0 siblings, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-12 19:09 UTC (permalink / raw)
  To: Ed Cashin; +Cc: ecashin, karaluh, roel.kluin, harvey.harrison, bzolnier, netdev

On Thu, 12 Nov 2009 13:07:35 -0500
Ed Cashin <ecashin@coraid.com> wrote:

> Thanks for the responses.
> 
> On Thu Nov 12 12:11:12 EST 2009, shemminger@vyatta.com wrote:
> ...
> > Since it is emulating a block device, why not propgate error back
> > up the stack like a disk that's offline.
> 
> The lack of local interfaces to use for AoE might be temporary.  For
> example, the admin might be loading a new network driver, or a new
> link might go online through which the AoE target can be reached.
> 
> If AoE command packets are not sent or resent but instead are
> effectively dropped while there are no local interfaces through which
> to send, then the AoE commands will timeout through the normal
> mechanisms, at which time the error will be signaled to the block
> layer.  It will be like normal unreliable ethernet transport.
> 
> In the common case, I think that's going to be the expected behavior,
> but a printk would probably still be helpful in case the admin doesn't
> realize why the AoE device is timing out.
> 

Okay, you might want to ratelimit the printk

-- 

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-11 14:22       ` Ed Cashin
@ 2009-11-13 21:39         ` Ed Cashin
  2009-11-13 22:24           ` Stephen Hemminger
  0 siblings, 1 reply; 61+ messages in thread
From: Ed Cashin @ 2009-11-13 21:39 UTC (permalink / raw)
  To: ecashin, shemminger, davem, harvey.harrison, bzolnier, netdev

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

I will be testing the patch below on Monday.  It is based on Stephen
Hemminger's starter patch (Thanks!).  It adds handling for the case
where all of the local network interfaces have become unusable.  In
that case, sent packets and retransmitted packets are effectively
dropped, and the driver's handling of ethernet's unreliability takes
care of things as usual.  If a local interface becomes usable in time
no I/O will fail.

After testing I'll make a proper changelog entry.  In brief this patch
allows the aoe driver to correctly take references on the net_device's
that it uses, handle their removal via a callback, and handle their
complete absence, too.

-- 
  Ed

[-- Attachment #2: Type: text/plain, Size: 5048 bytes --]

diff --git a/drivers/block/aoe/aoe.h b/drivers/block/aoe/aoe.h
index db195ab..315d36b 100644
--- a/drivers/block/aoe/aoe.h
+++ b/drivers/block/aoe/aoe.h
@@ -186,6 +186,7 @@ void aoecmd_ata_rsp(struct sk_buff *);
 void aoecmd_cfg_rsp(struct sk_buff *);
 void aoecmd_sleepwork(struct work_struct *);
 void aoecmd_cleanslate(struct aoedev *);
+void aoecmd_flushnet(struct aoedev *, struct net_device *);
 struct sk_buff *aoecmd_ata_id(struct aoedev *);
 
 int aoedev_init(void);
@@ -194,6 +195,7 @@ struct aoedev *aoedev_by_aoeaddr(int maj, int min);
 struct aoedev *aoedev_by_sysminor_m(ulong sysminor);
 void aoedev_downdev(struct aoedev *d);
 int aoedev_flush(const char __user *str, size_t size);
+void aoedev_ejectnet(struct net_device *);
 
 int aoenet_init(void);
 void aoenet_exit(void);
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index 965ece2..060c8d6 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -93,16 +93,16 @@ put_lba(struct aoe_atahdr *ah, sector_t lba)
 	ah->lba5 = lba >>= 8;
 }
 
-static void
+static struct net_device *
 ifrotate(struct aoetgt *t)
 {
 	t->ifp++;
 	if (t->ifp >= &t->ifs[NAOEIFS] || t->ifp->nd == NULL)
 		t->ifp = t->ifs;
-	if (t->ifp->nd == NULL) {
-		printk(KERN_INFO "aoe: no interface to rotate to\n");
-		BUG();
-	}
+	if (t->ifp->nd == NULL && net_ratelimit())
+		printk(KERN_WARNING
+			"aoe: no local interface to send through\n");
+	return t->ifp->nd;
 }
 
 static void
@@ -336,7 +336,8 @@ resend(struct aoedev *d, struct aoetgt *t, struct frame *f)
 	char buf[128];
 	u32 n;
 
-	ifrotate(t);
+	if (!ifrotate(t))
+		return;		/* no usable local network interfaces */
 	n = newtag(t);
 	skb = f->skb;
 	h = (struct aoe_hdr *) skb_mac_header(skb);
@@ -413,6 +414,8 @@ addif(struct aoetgt *t, struct net_device *nd)
 	p = getif(t, NULL);
 	if (!p)
 		return NULL;
+
+	dev_hold(nd);
 	p->nd = nd;
 	p->maxbcnt = DEFAULTBCNT;
 	p->lost = 0;
@@ -424,12 +427,30 @@ static void
 ejectif(struct aoetgt *t, struct aoeif *ifp)
 {
 	struct aoeif *e;
+	struct net_device *nd;
 	ulong n;
 
 	e = t->ifs + NAOEIFS - 1;
+	nd = e->nd;
 	n = (e - ifp) * sizeof *ifp;
 	memmove(ifp, ifp+1, n);
 	e->nd = NULL;
+	dev_put(nd);
+}
+
+void
+aoecmd_flushnet(struct aoedev *d, struct net_device *nd)
+{
+	struct aoetgt **tt, **te;
+	tt = d->targets;
+	te = tt + NTARGETS;
+	for (; tt < te && *tt; tt++) {
+		struct aoetgt *t = *tt;
+		struct aoeif *ifp;
+
+		if ( (ifp = getif(t, nd)) )
+			ejectif(t, ifp);
+	}
 }
 
 static int
diff --git a/drivers/block/aoe/aoedev.c b/drivers/block/aoe/aoedev.c
index fa67027..6754db0 100644
--- a/drivers/block/aoe/aoedev.c
+++ b/drivers/block/aoe/aoedev.c
@@ -162,6 +162,21 @@ aoedev_flush(const char __user *str, size_t cnt)
 	return 0;
 }
 
+void
+aoedev_ejectnet(struct net_device *nd)
+{
+	struct aoedev *d;
+	unsigned long flags;
+
+	spin_lock_irqsave(&devlist_lock, flags);
+	for (d = devlist; d; d = d->next) {
+		spin_lock(&d->lock);
+		aoecmd_flushnet(d, nd);
+		spin_unlock(&d->lock);
+	}
+	spin_unlock_irqrestore(&d->lock, flags);
+}
+
 /* I'm not really sure that this is a realistic problem, but if the
 network driver goes gonzo let's just leak memory after complaining. */
 static void
diff --git a/drivers/block/aoe/aoemain.c b/drivers/block/aoe/aoemain.c
index 7f83ad9..5f6c072 100644
--- a/drivers/block/aoe/aoemain.c
+++ b/drivers/block/aoe/aoemain.c
@@ -8,6 +8,8 @@
 #include <linux/blkdev.h>
 #include <linux/module.h>
 #include <linux/skbuff.h>
+#include <linux/notifier.h>
+#include <linux/netdevice.h>
 #include "aoe.h"
 
 MODULE_LICENSE("GPL");
@@ -54,11 +56,29 @@ discover_timer(ulong vp)
 	}
 }
 
+/* Callback on change of state of network device. */
+static int
+aoe_device_event(struct notifier_block *unused,
+		unsigned long event, void *ptr)
+{
+	struct net_device *nd = ptr;
+
+	if (event == NETDEV_UNREGISTER)
+		aoedev_ejectnet(nd);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block aoe_notifier = {
+	.notifier_call = aoe_device_event,
+};
+
 static void
 aoe_exit(void)
 {
 	discover_timer(TKILL);
 
+	unregister_netdevice_notifier(&aoe_notifier);
 	aoenet_exit();
 	unregister_blkdev(AOE_MAJOR, DEVICE_NAME);
 	aoechr_exit();
@@ -83,6 +103,9 @@ aoe_init(void)
 	ret = aoenet_init();
 	if (ret)
 		goto net_fail;
+	ret = register_netdevice_notifier(&aoe_notifier);
+	if (ret)
+		goto notifier_fail;
 	ret = register_blkdev(AOE_MAJOR, DEVICE_NAME);
 	if (ret < 0) {
 		printk(KERN_ERR "aoe: can't register major\n");
@@ -94,6 +117,8 @@ aoe_init(void)
 	return 0;
 
  blkreg_fail:
+	unregister_netdevice_notifier(&aoe_notifier);
+ notifier_fail:
 	aoenet_exit();
  net_fail:
 	aoeblk_exit();
diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
index ce0d62c..cc3872c 100644
--- a/drivers/block/aoe/aoenet.c
+++ b/drivers/block/aoe/aoenet.c
@@ -90,7 +90,10 @@ aoenet_xmit(struct sk_buff_head *queue)
 
 	skb_queue_walk_safe(queue, skb, tmp) {
 		__skb_unlink(skb, queue);
-		dev_queue_xmit(skb);
+		if (skb->dev)
+			dev_queue_xmit(skb);
+		else
+			dev_kfree_skb(skb);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-13 21:39         ` Ed Cashin
@ 2009-11-13 22:24           ` Stephen Hemminger
  0 siblings, 0 replies; 61+ messages in thread
From: Stephen Hemminger @ 2009-11-13 22:24 UTC (permalink / raw)
  To: Ed Cashin; +Cc: ecashin, davem, harvey.harrison, bzolnier, netdev

On Fri, 13 Nov 2009 16:39:49 -0500
Ed Cashin <ecashin@coraid.com> wrote:

> I will be testing the patch below on Monday.  It is based on Stephen
> Hemminger's starter patch (Thanks!).  It adds handling for the case
> where all of the local network interfaces have become unusable.  In
> that case, sent packets and retransmitted packets are effectively
> dropped, and the driver's handling of ethernet's unreliability takes
> care of things as usual.  If a local interface becomes usable in time
> no I/O will fail.
> 
> After testing I'll make a proper changelog entry.  In brief this patch
> allows the aoe driver to correctly take references on the net_device's
> that it uses, handle their removal via a callback, and handle their
> complete absence, too.

Looks good, good luck with testing.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH net-next-2.6] ipv6: use RCU to walk list of network devices
  2009-11-12  3:34   ` [PATCH net-next-2.6] " Eric Dumazet
@ 2009-11-14  4:39     ` David Miller
  0 siblings, 0 replies; 61+ messages in thread
From: David Miller @ 2009-11-14  4:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: shemminger, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 12 Nov 2009 04:34:30 +0100

> [PATCH net-next-2.6] ipv6: use RCU to walk list of network devices
> 
> No longer need read_lock(&dev_base_lock), use RCU instead.
> We also can avoid taking references on inet6_dev structs.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 04/10] AOE: use rcu to find network device
  2009-11-10 23:53       ` Stephen Hemminger
  2009-11-11  6:48         ` David Miller
  2009-11-12 14:33         ` Ed Cashin
@ 2009-11-18 16:49         ` Ed Cashin
  2 siblings, 0 replies; 61+ messages in thread
From: Ed Cashin @ 2009-11-18 16:49 UTC (permalink / raw)
  To: shemminger, karaluh, ecashin, roel.kluin, harvey.harrison,
	bzolnier, netdev

On Tue Nov 10 18:53:43 EST 2009, shemminger@vyatta.com wrote:

...
> --- a/drivers/block/aoe/aoecmd.c	2009-11-10 15:13:25.673859220 -0800
> +++ b/drivers/block/aoe/aoecmd.c	2009-11-10 15:49:20.009047132 -0800
...
> @@ -424,12 +426,29 @@ static void
>  ejectif(struct aoetgt *t, struct aoeif *ifp)
>  {
>  	struct aoeif *e;
> +	struct net_device *nd;
>  	ulong n;
>  
>  	e = t->ifs + NAOEIFS - 1;
> +	nd = e->nd;
>  	n = (e - ifp) * sizeof *ifp;
>  	memmove(ifp, ifp+1, n);
>  	e->nd = NULL;
> +	dev_put(nd);
> +}

I didn't notice this before, but e->nd will be the net_device pointer
from the last ifp in the allocated array, whether it's NULL or not,
and usually won't be ifp->nd, the one we need to give to dev_put.  I
think that's why I see an oops when testing the version of the patch
that I sent last.  The test is doing rmmod on the interface that's
being used to create an ext2 fs on an AoE target.

I'll use ifp->nd instead of e->nd and test again tomorrow.  I have to
step away from the computer today.

-- 
  Ed

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2009-11-18 16:50 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-10 17:54 [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Stephen Hemminger
2009-11-10 17:54 ` [PATCH 01/10] netdev: add netdev_continue_rcu Stephen Hemminger
2009-11-10 18:19   ` Eric Dumazet
2009-11-11  6:47     ` David Miller
2009-11-10 19:39   ` Paul E. McKenney
2009-11-10 17:54 ` [PATCH 02/10] vlan: eliminate use of dev_base_lock Stephen Hemminger
2009-11-10 18:20   ` Eric Dumazet
2009-11-11  6:47     ` David Miller
2009-11-10 17:54 ` [PATCH 03/10] net: use rcu for network scheduler API Stephen Hemminger
2009-11-10 18:20   ` Eric Dumazet
2009-11-11  6:47     ` David Miller
2009-11-10 17:54 ` [PATCH 04/10] AOE: use rcu to find network device Stephen Hemminger
2009-11-10 18:23   ` Eric Dumazet
2009-11-10 20:01   ` Ed Cashin
2009-11-10 23:06     ` Stephen Hemminger
2009-11-10 23:53       ` Stephen Hemminger
2009-11-11  6:48         ` David Miller
2009-11-12 14:33         ` Ed Cashin
2009-11-12 17:10           ` Stephen Hemminger
2009-11-12 18:07             ` Ed Cashin
2009-11-12 19:09               ` Stephen Hemminger
2009-11-18 16:49         ` Ed Cashin
2009-11-11 14:22       ` Ed Cashin
2009-11-13 21:39         ` Ed Cashin
2009-11-13 22:24           ` Stephen Hemminger
2009-11-10 17:54 ` [PATCH 05/10] parisc: use RCU " Stephen Hemminger
2009-11-10 17:54   ` Stephen Hemminger
2009-11-10 18:26   ` Eric Dumazet
2009-11-10 18:26     ` Eric Dumazet
2009-11-11  6:48   ` David Miller
2009-11-10 17:54 ` [PATCH 06/10] s390: use RCU to walk list of network devices Stephen Hemminger
2009-11-10 18:27   ` Eric Dumazet
2009-11-10 18:29     ` Stephen Hemminger
2009-11-10 18:40   ` Eric Dumazet
2009-11-11  6:49     ` David Miller
2009-11-10 17:54 ` [PATCH 07/10] decnet: use RCU to find " Stephen Hemminger
2009-11-10 18:43   ` Eric Dumazet
2009-11-10 18:50     ` Stephen Hemminger
2009-11-10 18:24       ` steve
2009-11-11 17:39         ` [PATCH 1/2] decnet: add RTNL lock when reading address list Stephen Hemminger
2009-11-11 17:40           ` [PATCH 2/2] decnet: convert dndev_lock to spinlock Stephen Hemminger
2009-11-12  3:56             ` David Miller
2009-11-12  3:56           ` [PATCH 1/2] decnet: add RTNL lock when reading address list David Miller
2009-11-10 19:25       ` [PATCH 07/10] decnet: use RCU to find network devices Eric Dumazet
2009-11-11  6:49   ` David Miller
2009-11-10 17:54 ` [PATCH 08/10] ipv6: use RCU to walk list of " Stephen Hemminger
2009-11-11  6:50   ` David Miller
2009-11-12  3:34   ` [PATCH net-next-2.6] " Eric Dumazet
2009-11-14  4:39     ` David Miller
2009-11-10 17:54 ` [PATCH 09/10] IPV4: use rcu to walk list of devices in IGMP Stephen Hemminger
2009-11-10 18:47   ` Eric Dumazet
2009-11-11  6:50   ` David Miller
2009-11-10 17:54 ` [PATCH 10/10] CAN: use dev_get_by_index_rcu Stephen Hemminger
2009-11-10 18:34   ` Eric Dumazet
2009-11-11  5:54     ` Oliver Hartkopp
2009-11-11  6:50       ` David Miller
2009-11-10 18:18 ` [PATCH 00/10] netdev: get rid of read_lock(&dev_base_lock) usages Eric Dumazet
2009-11-10 18:22   ` Stephen Hemminger
2009-11-10 18:24   ` Stephen Hemminger
2009-11-10 18:39     ` Eric Dumazet
2009-11-10 18:53       ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.