All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] [1/13] bridge: Add IGMP snooping support
@ 2010-02-26 15:34 Herbert Xu
  2010-02-26 15:35 ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
                   ` (13 more replies)
  0 siblings, 14 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:34 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

Hi:

This series of patches adds basic IGMP support to the bridge
device.  First of all the following is not currently supported
but may be added in future:

* IGMPv3 source support (so really just IGMPv2 for now)
* Non-querier router detection
* IPv6

The series is divided into two portions:

1-5 lays the ground work and can be merged without any of the
other patches.

6-13 are the actual IGMP-specific patches.

This is a kernel-only implementation.  In future we could move
parts of this into user-space just like RTP.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 1/13] bridge: Do br_pass_frame_up after other ports
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Do br_pass_frame_up after other ports

At the moment we deliver to the local bridge port via the function
br_pass_frame_up before all other ports.  There is no requirement
for this.

For the purpose of IGMP snooping, it would be more convenient if
we did the local port last.  Therefore this patch rearranges the
bridge input processing so that the local bridge port gets to see
the packet last (if at all).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_input.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 5ee1a36..9589937 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -73,9 +73,6 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	if (skb2 == skb)
 		skb2 = skb_clone(skb, GFP_ATOMIC);
 
-	if (skb2)
-		br_pass_frame_up(br, skb2);
-
 	if (skb) {
 		if (dst)
 			br_forward(dst->dst, skb);
@@ -83,6 +80,9 @@ int br_handle_frame_finish(struct sk_buff *skb)
 			br_flood_forward(br, skb);
 	}
 
+	if (skb2)
+		br_pass_frame_up(br, skb2);
+
 out:
 	return 0;
 drop:

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
  2010-02-26 15:35 ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-27 11:14   ` David Miller
  2010-02-26 15:35 ` [PATCH 3/13] bridge: Avoid unnecessary clone on forward path Herbert Xu
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Allow tail-call on br_pass_frame_up

This patch allows tail-call on the call to br_pass_frame_up
in br_handle_frame_finish.  This is now possible because of the
previous patch to call br_pass_frame_up last.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_input.c   |   12 +++++++-----
 net/bridge/br_private.h |    6 ++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 9589937..be5ab8d 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -20,9 +20,9 @@
 /* Bridge group multicast address 802.1d (pg 51). */
 const u8 br_group_address[ETH_ALEN] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 };
 
-static void br_pass_frame_up(struct net_bridge *br, struct sk_buff *skb)
+static int br_pass_frame_up(struct sk_buff *skb)
 {
-	struct net_device *indev, *brdev = br->dev;
+	struct net_device *indev, *brdev = BR_INPUT_SKB_CB(skb)->brdev;
 
 	brdev->stats.rx_packets++;
 	brdev->stats.rx_bytes += skb->len;
@@ -30,8 +30,8 @@ static void br_pass_frame_up(struct net_bridge *br, struct sk_buff *skb)
 	indev = skb->dev;
 	skb->dev = brdev;
 
-	NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL,
-		netif_receive_skb);
+	return NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL,
+		       netif_receive_skb);
 }
 
 /* note: already called with rcu_read_lock (preempt_disabled) */
@@ -53,6 +53,8 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	if (p->state == BR_STATE_LEARNING)
 		goto drop;
 
+	BR_INPUT_SKB_CB(skb)->brdev = br->dev;
+
 	/* The packet skb2 goes to the local host (NULL to skip). */
 	skb2 = NULL;
 
@@ -81,7 +83,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	}
 
 	if (skb2)
-		br_pass_frame_up(br, skb2);
+		return br_pass_frame_up(skb2);
 
 out:
 	return 0;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2114e45..a38d738 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -132,6 +132,12 @@ struct net_bridge
 	struct kobject			*ifobj;
 };
 
+struct br_input_skb_cb {
+	struct net_device *brdev;
+};
+
+#define BR_INPUT_SKB_CB(__skb)	((struct br_input_skb_cb *)(__skb)->cb)
+
 extern struct notifier_block br_device_notifier;
 extern const u8 br_group_address[ETH_ALEN];
 

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 3/13] bridge: Avoid unnecessary clone on forward path
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
  2010-02-26 15:35 ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
  2010-02-26 15:35 ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path Herbert Xu
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Avoid unnecessary clone on forward path

When the packet is delivered to the local bridge device we may
end up cloning it unnecessarily if no bridge port can receive
the packet in br_flood.

This patch avoids this by moving the skb_clone into br_flood.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_forward.c |   33 ++++++++++++++++++++++-----------
 net/bridge/br_input.c   |    5 +----
 net/bridge/br_private.h |    3 ++-
 3 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index bc1704a..6cd50c6 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -105,8 +105,9 @@ void br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
 
 /* called under bridge lock */
 static void br_flood(struct net_bridge *br, struct sk_buff *skb,
-	void (*__packet_hook)(const struct net_bridge_port *p,
-			      struct sk_buff *skb))
+		     struct sk_buff *skb0,
+		     void (*__packet_hook)(const struct net_bridge_port *p,
+					   struct sk_buff *skb))
 {
 	struct net_bridge_port *p;
 	struct net_bridge_port *prev;
@@ -120,8 +121,7 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 
 				if ((skb2 = skb_clone(skb, GFP_ATOMIC)) == NULL) {
 					br->dev->stats.tx_dropped++;
-					kfree_skb(skb);
-					return;
+					goto out;
 				}
 
 				__packet_hook(prev, skb2);
@@ -131,23 +131,34 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 		}
 	}
 
-	if (prev != NULL) {
-		__packet_hook(prev, skb);
-		return;
+	if (!prev)
+		goto out;
+
+	if (skb0) {
+		skb = skb_clone(skb, GFP_ATOMIC);
+		if (!skb) {
+			br->dev->stats.tx_dropped++;
+			goto out;
+		}
 	}
+	__packet_hook(prev, skb);
+	return;
 
-	kfree_skb(skb);
+out:
+	if (!skb0)
+		kfree_skb(skb);
 }
 
 
 /* called with rcu_read_lock */
 void br_flood_deliver(struct net_bridge *br, struct sk_buff *skb)
 {
-	br_flood(br, skb, __br_deliver);
+	br_flood(br, skb, NULL, __br_deliver);
 }
 
 /* called under bridge lock */
-void br_flood_forward(struct net_bridge *br, struct sk_buff *skb)
+void br_flood_forward(struct net_bridge *br, struct sk_buff *skb,
+		      struct sk_buff *skb2)
 {
-	br_flood(br, skb, __br_forward);
+	br_flood(br, skb, skb2, __br_forward);
 }
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index be5ab8d..edfdaef 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -72,14 +72,11 @@ int br_handle_frame_finish(struct sk_buff *skb)
 		skb = NULL;
 	}
 
-	if (skb2 == skb)
-		skb2 = skb_clone(skb, GFP_ATOMIC);
-
 	if (skb) {
 		if (dst)
 			br_forward(dst->dst, skb);
 		else
-			br_flood_forward(br, skb);
+			br_flood_forward(br, skb, skb2);
 	}
 
 	if (skb2)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index a38d738..7b0aed5 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -181,7 +181,8 @@ extern void br_forward(const struct net_bridge_port *to,
 		struct sk_buff *skb);
 extern int br_forward_finish(struct sk_buff *skb);
 extern void br_flood_deliver(struct net_bridge *br, struct sk_buff *skb);
-extern void br_flood_forward(struct net_bridge *br, struct sk_buff *skb);
+extern void br_flood_forward(struct net_bridge *br, struct sk_buff *skb,
+			     struct sk_buff *skb2);
 
 /* br_if.c */
 extern void br_port_carrier_check(struct net_bridge_port *p);

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (2 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 3/13] bridge: Avoid unnecessary clone on forward path Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood Herbert Xu
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Use BR_INPUT_SKB_CB on xmit path

this patch makes BR_INPUT_SKB_CB available on the xmit path so
that we could avoid passing the br pointer around for the purpose
of collecting device statistics.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c  |    2 ++
 net/bridge/br_forward.c |    5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 1a99c4e..be35629 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -26,6 +26,8 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	const unsigned char *dest = skb->data;
 	struct net_bridge_fdb_entry *dst;
 
+	BR_INPUT_SKB_CB(skb)->brdev = dev;
+
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += skb->len;
 
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 6cd50c6..2e1cb43 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -111,6 +111,7 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 {
 	struct net_bridge_port *p;
 	struct net_bridge_port *prev;
+	struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
 
 	prev = NULL;
 
@@ -120,7 +121,7 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 				struct sk_buff *skb2;
 
 				if ((skb2 = skb_clone(skb, GFP_ATOMIC)) == NULL) {
-					br->dev->stats.tx_dropped++;
+					dev->stats.tx_dropped++;
 					goto out;
 				}
 
@@ -137,7 +138,7 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 	if (skb0) {
 		skb = skb_clone(skb, GFP_ATOMIC);
 		if (!skb) {
-			br->dev->stats.tx_dropped++;
+			dev->stats.tx_dropped++;
 			goto out;
 		}
 	}

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (3 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Split may_deliver/deliver_clone out of br_flood

This patch moves the main loop body in br_flood into the function
may_deliver.  The code that clones an skb and delivers it is moved
into the deliver_clone function.

This allows this to be reused by the future multicast forward
function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_forward.c |   69 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 46 insertions(+), 23 deletions(-)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 2e1cb43..86cd071 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -11,6 +11,7 @@
  *	2 of the License, or (at your option) any later version.
  */
 
+#include <linux/err.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>
@@ -103,6 +104,44 @@ void br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
 	kfree_skb(skb);
 }
 
+static int deliver_clone(struct net_bridge_port *prev, struct sk_buff *skb,
+			 void (*__packet_hook)(const struct net_bridge_port *p,
+					       struct sk_buff *skb))
+{
+	skb = skb_clone(skb, GFP_ATOMIC);
+	if (!skb) {
+		struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+
+		dev->stats.tx_dropped++;
+		return -ENOMEM;
+	}
+
+	__packet_hook(prev, skb);
+	return 0;
+}
+
+static struct net_bridge_port *maybe_deliver(
+	struct net_bridge_port *prev, struct net_bridge_port *p,
+	struct sk_buff *skb,
+	void (*__packet_hook)(const struct net_bridge_port *p,
+			      struct sk_buff *skb))
+{
+	int err;
+
+	if (!should_deliver(p, skb))
+		return prev;
+
+	if (!prev)
+		goto out;
+
+	err = deliver_clone(prev, skb, __packet_hook);
+	if (err)
+		return ERR_PTR(err);
+
+out:
+	return p;
+}
+
 /* called under bridge lock */
 static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 		     struct sk_buff *skb0,
@@ -111,38 +150,22 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 {
 	struct net_bridge_port *p;
 	struct net_bridge_port *prev;
-	struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
 
 	prev = NULL;
 
 	list_for_each_entry_rcu(p, &br->port_list, list) {
-		if (should_deliver(p, skb)) {
-			if (prev != NULL) {
-				struct sk_buff *skb2;
-
-				if ((skb2 = skb_clone(skb, GFP_ATOMIC)) == NULL) {
-					dev->stats.tx_dropped++;
-					goto out;
-				}
-
-				__packet_hook(prev, skb2);
-			}
-
-			prev = p;
-		}
+		prev = maybe_deliver(prev, p, skb, __packet_hook);
+		if (IS_ERR(prev))
+			goto out;
 	}
 
 	if (!prev)
 		goto out;
 
-	if (skb0) {
-		skb = skb_clone(skb, GFP_ATOMIC);
-		if (!skb) {
-			dev->stats.tx_dropped++;
-			goto out;
-		}
-	}
-	__packet_hook(prev, skb);
+	if (skb0)
+		deliver_clone(prev, skb, __packet_hook);
+	else
+		__packet_hook(prev, skb);
 	return;
 
 out:

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (4 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 7/13] bridge: Add multicast forwarding functions Herbert Xu
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add core IGMP snooping support

This patch adds the core functionality of IGMP snooping support
without actually hooking it up.  So this patch should be a no-op
as far as the bridge's external behaviour is concerned.

All the new code and data is controlled by the Kconfig option
BRIDGE_IGMP_SNOOPING.  A run-time toggle is also available.

The multicast switching is done using an hash table that is
lockless on the read-side through RCU.  On the write-side the
new multicast_lock is used for all operations.  The hash table
supports dynamic growth/rehashing.

The hash table will be rehashed if any chain length exceeds a
preset limit.  If rehashing does not reduce the maximum chain
length then snooping will be disabled.

These features may be added in future (in no particular order):

* IGMPv3 source support
* Non-querier router detection
* IPv6

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/Kconfig        |   12 
 net/bridge/Makefile       |    2 
 net/bridge/br_multicast.c | 1135 ++++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h   |  139 +++++
 4 files changed, 1288 insertions(+)

diff --git a/net/bridge/Kconfig b/net/bridge/Kconfig
index e143ca6..78dd549 100644
--- a/net/bridge/Kconfig
+++ b/net/bridge/Kconfig
@@ -31,3 +31,15 @@ config BRIDGE
 	  will be called bridge.
 
 	  If unsure, say N.
+
+config BRIDGE_IGMP_SNOOPING
+	bool "IGMP snooping"
+	default y
+	---help---
+	  If you say Y here, then the Ethernet bridge will be able selectively
+	  forward multicast traffic based on IGMP traffic received from each
+	  port.
+
+	  Say N to exclude this support and reduce the binary size.
+
+	  If unsure, say Y.
diff --git a/net/bridge/Makefile b/net/bridge/Makefile
index f444c12..d0359ea 100644
--- a/net/bridge/Makefile
+++ b/net/bridge/Makefile
@@ -12,4 +12,6 @@ bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o
 
 bridge-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o
 
+bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o
+
 obj-$(CONFIG_BRIDGE_NF_EBTABLES) += netfilter/
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
new file mode 100644
index 0000000..746b5a6
--- /dev/null
+++ b/net/bridge/br_multicast.c
@@ -0,0 +1,1135 @@
+/*
+ * Bridge multicast support.
+ *
+ * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include <linux/err.h>
+#include <linux/if_ether.h>
+#include <linux/igmp.h>
+#include <linux/jhash.h>
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <linux/netfilter_bridge.h>
+#include <linux/random.h>
+#include <linux/rculist.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/timer.h>
+#include <net/ip.h>
+
+#include "br_private.h"
+
+static inline int br_ip_hash(struct net_bridge_mdb_htable *mdb, __be32 ip)
+{
+	return jhash_1word(mdb->secret, (u32)ip) & (mdb->max - 1);
+}
+
+static struct net_bridge_mdb_entry *__br_mdb_ip_get(
+	struct net_bridge_mdb_htable *mdb, __be32 dst, int hash)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *p;
+
+	hlist_for_each_entry(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
+		if (dst == mp->addr)
+			return mp;
+	}
+
+	return NULL;
+}
+
+static struct net_bridge_mdb_entry *br_mdb_ip_get(
+	struct net_bridge_mdb_htable *mdb, __be32 dst)
+{
+	return __br_mdb_ip_get(mdb, dst, br_ip_hash(mdb, dst));
+}
+
+struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
+					struct sk_buff *skb)
+{
+	struct net_bridge_mdb_htable *mdb = br->mdb;
+
+	if (!mdb || br->multicast_disabled)
+		return NULL;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if (BR_INPUT_SKB_CB(skb)->igmp)
+			break;
+		return br_mdb_ip_get(mdb, ip_hdr(skb)->daddr);
+	}
+
+	return NULL;
+}
+
+static void br_mdb_free(struct rcu_head *head)
+{
+	struct net_bridge_mdb_htable *mdb =
+		container_of(head, struct net_bridge_mdb_htable, rcu);
+	struct net_bridge_mdb_htable *old = mdb->old;
+
+	mdb->old = NULL;
+	kfree(old->mhash);
+	kfree(old);
+}
+
+static int br_mdb_copy(struct net_bridge_mdb_htable *new,
+		       struct net_bridge_mdb_htable *old,
+		       int elasticity)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *p;
+	int maxlen;
+	int len;
+	int i;
+
+	for (i = 0; i < old->max; i++)
+		hlist_for_each_entry(mp, p, &old->mhash[i], hlist[old->ver])
+			hlist_add_head(&mp->hlist[new->ver],
+				       &new->mhash[br_ip_hash(new, mp->addr)]);
+
+	if (!elasticity)
+		return 0;
+
+	maxlen = 0;
+	for (i = 0; i < new->max; i++) {
+		len = 0;
+		hlist_for_each_entry(mp, p, &new->mhash[i], hlist[new->ver])
+			len++;
+		if (len > maxlen)
+			maxlen = len;
+	}
+
+	return maxlen > elasticity ? -EINVAL : 0;
+}
+
+static void br_multicast_free_pg(struct rcu_head *head)
+{
+	struct net_bridge_port_group *p =
+		container_of(head, struct net_bridge_port_group, rcu);
+
+	kfree(p);
+}
+
+static void br_multicast_free_group(struct rcu_head *head)
+{
+	struct net_bridge_mdb_entry *mp =
+		container_of(head, struct net_bridge_mdb_entry, rcu);
+
+	kfree(mp);
+}
+
+static void br_multicast_group_expired(unsigned long data)
+{
+	struct net_bridge_mdb_entry *mp = (void *)data;
+	struct net_bridge *br = mp->br;
+	struct net_bridge_mdb_htable *mdb;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || timer_pending(&mp->timer))
+		goto out;
+
+	if (!hlist_unhashed(&mp->mglist))
+		hlist_del_init(&mp->mglist);
+
+	if (mp->ports)
+		goto out;
+
+	mdb = br->mdb;
+	hlist_del_rcu(&mp->hlist[mdb->ver]);
+	mdb->size--;
+
+	del_timer(&mp->query_timer);
+	call_rcu_bh(&mp->rcu, br_multicast_free_group);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static void br_multicast_del_pg(struct net_bridge *br,
+				struct net_bridge_port_group *pg)
+{
+	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group **pp;
+
+	mp = br_mdb_ip_get(mdb, pg->addr);
+	if (WARN_ON(!mp))
+		return;
+
+	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+		if (p != pg)
+			continue;
+
+		*pp = p->next;
+		hlist_del_init(&p->mglist);
+		del_timer(&p->timer);
+		del_timer(&p->query_timer);
+		call_rcu_bh(&p->rcu, br_multicast_free_pg);
+
+		if (!mp->ports && hlist_unhashed(&mp->mglist) &&
+		    netif_running(br->dev))
+			mod_timer(&mp->timer, jiffies);
+
+		return;
+	}
+
+	WARN_ON(1);
+}
+
+static void br_multicast_port_group_expired(unsigned long data)
+{
+	struct net_bridge_port_group *pg = (void *)data;
+	struct net_bridge *br = pg->port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || timer_pending(&pg->timer) ||
+	    hlist_unhashed(&pg->mglist))
+		goto out;
+
+	br_multicast_del_pg(br, pg);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static int br_mdb_rehash(struct net_bridge_mdb_htable **mdbp, int max,
+			 int elasticity)
+{
+	struct net_bridge_mdb_htable *old = *mdbp;
+	struct net_bridge_mdb_htable *mdb;
+	int err;
+
+	mdb = kmalloc(sizeof(*mdb), GFP_ATOMIC);
+	if (!mdb)
+		return -ENOMEM;
+
+	mdb->max = max;
+	mdb->old = old;
+
+	mdb->mhash = kzalloc(max * sizeof(*mdb->mhash), GFP_ATOMIC);
+	if (!mdb->mhash) {
+		kfree(mdb);
+		return -ENOMEM;
+	}
+
+	mdb->size = old ? old->size : 0;
+	mdb->ver = old ? old->ver ^ 1 : 0;
+
+	if (!old || elasticity)
+		get_random_bytes(&mdb->secret, sizeof(mdb->secret));
+	else
+		mdb->secret = old->secret;
+
+	if (!old)
+		goto out;
+
+	err = br_mdb_copy(mdb, old, elasticity);
+	if (err) {
+		kfree(mdb->mhash);
+		kfree(mdb);
+		return err;
+	}
+
+	call_rcu_bh(&mdb->rcu, br_mdb_free);
+
+out:
+	rcu_assign_pointer(*mdbp, mdb);
+
+	return 0;
+}
+
+static struct sk_buff *br_multicast_alloc_query(struct net_bridge *br,
+						__be32 group)
+{
+	struct sk_buff *skb;
+	struct igmphdr *ih;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+
+	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*iph) +
+						 sizeof(*ih) + 4);
+	if (!skb)
+		goto out;
+
+	skb->protocol = htons(ETH_P_IP);
+
+	skb_reset_mac_header(skb);
+	eth = eth_hdr(skb);
+
+	memcpy(eth->h_source, br->dev->dev_addr, 6);
+	eth->h_dest[0] = 1;
+	eth->h_dest[1] = 0;
+	eth->h_dest[2] = 0x5e;
+	eth->h_dest[3] = 0;
+	eth->h_dest[4] = 0;
+	eth->h_dest[5] = 1;
+	eth->h_proto = htons(ETH_P_IP);
+	skb_put(skb, sizeof(*eth));
+
+	skb_set_network_header(skb, skb->len);
+	iph = ip_hdr(skb);
+
+	iph->version = 4;
+	iph->ihl = 6;
+	iph->tos = 0xc0;
+	iph->tot_len = htons(sizeof(*iph) + sizeof(*ih) + 4);
+	iph->id = 0;
+	iph->frag_off = htons(IP_DF);
+	iph->ttl = 1;
+	iph->protocol = IPPROTO_IGMP;
+	iph->saddr = 0;
+	iph->daddr = htonl(INADDR_ALLHOSTS_GROUP);
+	((u8 *)&iph[1])[0] = IPOPT_RA;
+	((u8 *)&iph[1])[1] = 4;
+	((u8 *)&iph[1])[2] = 0;
+	((u8 *)&iph[1])[3] = 0;
+	ip_send_check(iph);
+	skb_put(skb, 24);
+
+	skb_set_transport_header(skb, skb->len);
+	ih = igmp_hdr(skb);
+	ih->type = IGMP_HOST_MEMBERSHIP_QUERY;
+	ih->code = (group ? br->multicast_last_member_interval :
+			    br->multicast_query_response_interval) /
+		   (HZ / IGMP_TIMER_SCALE);
+	ih->group = group;
+	ih->csum = 0;
+	ih->csum = ip_compute_csum((void *)ih, sizeof(struct igmphdr));
+	skb_put(skb, sizeof(*ih));
+
+	__skb_pull(skb, sizeof(*eth));
+
+out:
+	return skb;
+}
+
+static void br_multicast_send_group_query(struct net_bridge_mdb_entry *mp)
+{
+	struct net_bridge *br = mp->br;
+	struct sk_buff *skb;
+
+	skb = br_multicast_alloc_query(br, mp->addr);
+	if (!skb)
+		goto timer;
+
+	netif_rx(skb);
+
+timer:
+	if (++mp->queries_sent < br->multicast_last_member_count)
+		mod_timer(&mp->query_timer,
+			  jiffies + br->multicast_last_member_interval);
+}
+
+static void br_multicast_group_query_expired(unsigned long data)
+{
+	struct net_bridge_mdb_entry *mp = (void *)data;
+	struct net_bridge *br = mp->br;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || hlist_unhashed(&mp->mglist) ||
+	    mp->queries_sent >= br->multicast_last_member_count)
+		goto out;
+
+	br_multicast_send_group_query(mp);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static void br_multicast_send_port_group_query(struct net_bridge_port_group *pg)
+{
+	struct net_bridge_port *port = pg->port;
+	struct net_bridge *br = port->br;
+	struct sk_buff *skb;
+
+	skb = br_multicast_alloc_query(br, pg->addr);
+	if (!skb)
+		goto timer;
+
+	br_deliver(port, skb);
+
+timer:
+	if (++pg->queries_sent < br->multicast_last_member_count)
+		mod_timer(&pg->query_timer,
+			  jiffies + br->multicast_last_member_interval);
+}
+
+static void br_multicast_port_group_query_expired(unsigned long data)
+{
+	struct net_bridge_port_group *pg = (void *)data;
+	struct net_bridge_port *port = pg->port;
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || hlist_unhashed(&pg->mglist) ||
+	    pg->queries_sent >= br->multicast_last_member_count)
+		goto out;
+
+	br_multicast_send_port_group_query(pg);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static struct net_bridge_mdb_entry *br_multicast_get_group(
+	struct net_bridge *br, struct net_bridge_port *port, __be32 group,
+	int hash)
+{
+	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *p;
+	unsigned count = 0;
+	unsigned max;
+	int elasticity;
+	int err;
+
+	hlist_for_each_entry(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
+		count++;
+		if (unlikely(group == mp->addr)) {
+			return mp;
+		}
+	}
+
+	elasticity = 0;
+	max = mdb->max;
+
+	if (unlikely(count > br->hash_elasticity && count)) {
+		if (net_ratelimit())
+			printk(KERN_INFO "%s: Multicast hash table "
+			       "chain limit reached: %s\n",
+			       br->dev->name, port ? port->dev->name :
+						     br->dev->name);
+
+		elasticity = br->hash_elasticity;
+	}
+
+	if (mdb->size >= max) {
+		max *= 2;
+		if (unlikely(max >= br->hash_max)) {
+			printk(KERN_WARNING "%s: Multicast hash table maximum "
+			       "reached, disabling snooping: %s, %d\n",
+			       br->dev->name, port ? port->dev->name :
+						     br->dev->name,
+			       max);
+			err = -E2BIG;
+disable:
+			br->multicast_disabled = 1;
+			goto err;
+		}
+	}
+
+	if (max > mdb->max || elasticity) {
+		if (mdb->old) {
+			if (net_ratelimit())
+				printk(KERN_INFO "%s: Multicast hash table "
+				       "on fire: %s\n",
+				       br->dev->name, port ? port->dev->name :
+							     br->dev->name);
+			err = -EEXIST;
+			goto err;
+		}
+
+		err = br_mdb_rehash(&br->mdb, max, elasticity);
+		if (err) {
+			printk(KERN_WARNING "%s: Cannot rehash multicast "
+			       "hash table, disabling snooping: "
+			       "%s, %d, %d\n",
+			       br->dev->name, port ? port->dev->name :
+						     br->dev->name,
+			       mdb->size, err);
+			goto disable;
+		}
+
+		err = -EAGAIN;
+		goto err;
+	}
+
+	return NULL;
+
+err:
+	mp = ERR_PTR(err);
+	return mp;
+}
+
+static struct net_bridge_mdb_entry *br_multicast_new_group(
+	struct net_bridge *br, struct net_bridge_port *port, __be32 group)
+{
+	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_entry *mp;
+	int hash;
+
+	if (!mdb) {
+		if (br_mdb_rehash(&br->mdb, BR_HASH_SIZE, 0))
+			return NULL;
+		goto rehash;
+	}
+
+	hash = br_ip_hash(mdb, group);
+	mp = br_multicast_get_group(br, port, group, hash);
+	switch (PTR_ERR(mp)) {
+	case 0:
+		break;
+
+	case -EAGAIN:
+rehash:
+		mdb = br->mdb;
+		hash = br_ip_hash(mdb, group);
+		break;
+
+	default:
+		goto out;
+	}
+
+	mp = kzalloc(sizeof(*mp), GFP_ATOMIC);
+	if (unlikely(!mp))
+		goto out;
+
+	mp->br = br;
+	mp->addr = group;
+	setup_timer(&mp->timer, br_multicast_group_expired,
+		    (unsigned long)mp);
+	setup_timer(&mp->query_timer, br_multicast_group_query_expired,
+		    (unsigned long)mp);
+
+	hlist_add_head_rcu(&mp->hlist[mdb->ver], &mdb->mhash[hash]);
+	mdb->size++;
+
+out:
+	return mp;
+}
+
+static int br_multicast_add_group(struct net_bridge *br,
+				  struct net_bridge_port *port, __be32 group)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group **pp;
+	unsigned long now = jiffies;
+	int err;
+
+	if (ipv4_is_local_multicast(group))
+		return 0;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) ||
+	    (port && port->state == BR_STATE_DISABLED))
+		goto out;
+
+	mp = br_multicast_new_group(br, port, group);
+	err = PTR_ERR(mp);
+	if (unlikely(IS_ERR(mp) || !mp))
+		goto err;
+
+	if (!port) {
+		hlist_add_head(&mp->mglist, &br->mglist);
+		mod_timer(&mp->timer, now + br->multicast_membership_interval);
+		goto out;
+	}
+
+	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+		if (p->port == port)
+			goto found;
+		if ((unsigned long)p->port < (unsigned long)port)
+			break;
+	}
+
+	p = kzalloc(sizeof(*p), GFP_ATOMIC);
+	err = -ENOMEM;
+	if (unlikely(!p))
+		goto err;
+
+	p->addr = group;
+	p->port = port;
+	p->next = *pp;
+	hlist_add_head(&p->mglist, &port->mglist);
+	setup_timer(&p->timer, br_multicast_port_group_expired,
+		    (unsigned long)p);
+	setup_timer(&p->query_timer, br_multicast_port_group_query_expired,
+		    (unsigned long)p);
+
+	rcu_assign_pointer(*pp, p);
+
+found:
+	mod_timer(&p->timer, now + br->multicast_membership_interval);
+out:
+	err = 0;
+
+err:
+	spin_unlock(&br->multicast_lock);
+	return err;
+}
+
+static void br_multicast_router_expired(unsigned long data)
+{
+	struct net_bridge_port *port = (void *)data;
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (port->multicast_router != 1 ||
+	    timer_pending(&port->multicast_router_timer) ||
+	    hlist_unhashed(&port->rlist))
+		goto out;
+
+	hlist_del_init_rcu(&port->rlist);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static void br_multicast_local_router_expired(unsigned long data)
+{
+}
+
+static void br_multicast_send_query(struct net_bridge *br,
+				    struct net_bridge_port *port, u32 sent)
+{
+	unsigned long time;
+	struct sk_buff *skb;
+
+	if (!netif_running(br->dev) || br->multicast_disabled ||
+	    timer_pending(&br->multicast_querier_timer))
+		return;
+
+	skb = br_multicast_alloc_query(br, 0);
+	if (!skb)
+		goto timer;
+
+	if (port) {
+		__skb_push(skb, sizeof(struct ethhdr));
+		skb->dev = port->dev;
+		NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
+			dev_queue_xmit);
+	} else
+		netif_rx(skb);
+
+timer:
+	time = jiffies;
+	time += sent < br->multicast_startup_query_count ?
+		br->multicast_startup_query_interval :
+		br->multicast_query_interval;
+	mod_timer(port ? &port->multicast_query_timer :
+			 &br->multicast_query_timer, time);
+}
+
+static void br_multicast_port_query_expired(unsigned long data)
+{
+	struct net_bridge_port *port = (void *)data;
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (port && (port->state == BR_STATE_DISABLED ||
+		     port->state == BR_STATE_BLOCKING))
+		goto out;
+
+	if (port->multicast_startup_queries_sent <
+	    br->multicast_startup_query_count)
+		port->multicast_startup_queries_sent++;
+
+	br_multicast_send_query(port->br, port,
+				port->multicast_startup_queries_sent);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+void br_multicast_add_port(struct net_bridge_port *port)
+{
+	port->multicast_router = 1;
+
+	setup_timer(&port->multicast_router_timer, br_multicast_router_expired,
+		    (unsigned long)port);
+	setup_timer(&port->multicast_query_timer,
+		    br_multicast_port_query_expired, (unsigned long)port);
+}
+
+void br_multicast_del_port(struct net_bridge_port *port)
+{
+	del_timer_sync(&port->multicast_router_timer);
+}
+
+void br_multicast_enable_port(struct net_bridge_port *port)
+{
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (br->multicast_disabled || !netif_running(br->dev))
+		goto out;
+
+	port->multicast_startup_queries_sent = 0;
+
+	if (try_to_del_timer_sync(&port->multicast_query_timer) >= 0 ||
+	    del_timer(&port->multicast_query_timer))
+		mod_timer(&port->multicast_query_timer, jiffies);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+void br_multicast_disable_port(struct net_bridge_port *port)
+{
+	struct net_bridge *br = port->br;
+	struct net_bridge_port_group *pg;
+	struct hlist_node *p, *n;
+
+	spin_lock(&br->multicast_lock);
+	hlist_for_each_entry_safe(pg, p, n, &port->mglist, mglist)
+		br_multicast_del_pg(br, pg);
+
+	if (!hlist_unhashed(&port->rlist))
+		hlist_del_init_rcu(&port->rlist);
+	del_timer(&port->multicast_router_timer);
+	del_timer(&port->multicast_query_timer);
+	spin_unlock(&br->multicast_lock);
+}
+
+static int br_multicast_igmp3_report(struct net_bridge *br,
+				     struct net_bridge_port *port,
+				     struct sk_buff *skb)
+{
+	struct igmpv3_report *ih;
+	struct igmpv3_grec *grec;
+	int i;
+	int len;
+	int num;
+	int type;
+	int err = 0;
+	__be32 group;
+
+	if (!pskb_may_pull(skb, sizeof(*ih)))
+		return -EINVAL;
+
+	ih = igmpv3_report_hdr(skb);
+	num = ntohs(ih->ngrec);
+	len = sizeof(*ih);
+
+	for (i = 0; i < num; i++) {
+		len += sizeof(*grec);
+		if (!pskb_may_pull(skb, len))
+			return -EINVAL;
+
+		grec = (void *)(skb->data + len);
+		group = grec->grec_mca;
+		type = grec->grec_type;
+
+		len += grec->grec_nsrcs * 4;
+		if (!pskb_may_pull(skb, len))
+			return -EINVAL;
+
+		/* We treat this as an IGMPv2 report for now. */
+		switch (type) {
+		case IGMPV3_MODE_IS_INCLUDE:
+		case IGMPV3_MODE_IS_EXCLUDE:
+		case IGMPV3_CHANGE_TO_INCLUDE:
+		case IGMPV3_CHANGE_TO_EXCLUDE:
+		case IGMPV3_ALLOW_NEW_SOURCES:
+		case IGMPV3_BLOCK_OLD_SOURCES:
+			break;
+
+		default:
+			continue;
+		}
+
+		err = br_multicast_add_group(br, port, group);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static void br_multicast_mark_router(struct net_bridge *br,
+				     struct net_bridge_port *port)
+{
+	unsigned long now = jiffies;
+	struct hlist_node *p;
+	struct hlist_node **h;
+
+	if (!port) {
+		if (br->multicast_router == 1)
+			mod_timer(&br->multicast_router_timer,
+				  now + br->multicast_querier_interval);
+		return;
+	}
+
+	if (port->multicast_router != 1)
+		return;
+
+	if (!hlist_unhashed(&port->rlist))
+		goto timer;
+
+	for (h = &br->router_list.first;
+	     (p = *h) &&
+	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
+	     (unsigned long)port;
+	     h = &p->next)
+		;
+
+	port->rlist.pprev = h;
+	port->rlist.next = p;
+	rcu_assign_pointer(*h, &port->rlist);
+	if (p)
+		p->pprev = &port->rlist.next;
+
+timer:
+	mod_timer(&port->multicast_router_timer,
+		  now + br->multicast_querier_interval);
+}
+
+static void br_multicast_query_received(struct net_bridge *br,
+					struct net_bridge_port *port,
+					__be32 saddr)
+{
+	if (saddr)
+		mod_timer(&br->multicast_querier_timer,
+			  jiffies + br->multicast_querier_interval);
+	else if (timer_pending(&br->multicast_querier_timer))
+		return;
+
+	br_multicast_mark_router(br, port);
+}
+
+static int br_multicast_query(struct net_bridge *br,
+			      struct net_bridge_port *port,
+			      struct sk_buff *skb)
+{
+	struct iphdr *iph = ip_hdr(skb);
+	struct igmphdr *ih = igmp_hdr(skb);
+	struct net_bridge_mdb_entry *mp;
+	struct igmpv3_query *ih3;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group **pp;
+	unsigned long max_delay;
+	unsigned long now = jiffies;
+	__be32 group;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) ||
+	    (port && port->state == BR_STATE_DISABLED))
+		goto out;
+
+	br_multicast_query_received(br, port, iph->saddr);
+
+	group = ih->group;
+
+	if (skb->len == sizeof(*ih)) {
+		max_delay = ih->code * (HZ / IGMP_TIMER_SCALE);
+
+		if (!max_delay) {
+			max_delay = 10 * HZ;
+			group = 0;
+		}
+	} else {
+		if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)))
+			return -EINVAL;
+
+		ih3 = igmpv3_query_hdr(skb);
+		if (ih3->nsrcs)
+			return 0;
+
+		max_delay = ih3->code ? 1 :
+			    IGMPV3_MRC(ih3->code) * (HZ / IGMP_TIMER_SCALE);
+	}
+
+	if (!group)
+		goto out;
+
+	mp = br_mdb_ip_get(br->mdb, group);
+	if (!mp)
+		goto out;
+
+	max_delay *= br->multicast_last_member_count;
+
+	if (!hlist_unhashed(&mp->mglist) &&
+	    (timer_pending(&mp->timer) ?
+	     time_after(mp->timer.expires, now + max_delay) :
+	     try_to_del_timer_sync(&mp->timer) >= 0))
+		mod_timer(&mp->timer, now + max_delay);
+
+	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+		if (timer_pending(&p->timer) ?
+		    time_after(p->timer.expires, now + max_delay) :
+		    try_to_del_timer_sync(&p->timer) >= 0)
+			mod_timer(&mp->timer, now + max_delay);
+	}
+
+out:
+	spin_unlock(&br->multicast_lock);
+	return 0;
+}
+
+static void br_multicast_leave_group(struct net_bridge *br,
+				     struct net_bridge_port *port,
+				     __be32 group)
+{
+	struct net_bridge_mdb_htable *mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct net_bridge_port_group *p;
+	unsigned long now;
+	unsigned long time;
+
+	if (ipv4_is_local_multicast(group))
+		return;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) ||
+	    (port && port->state == BR_STATE_DISABLED) ||
+	    timer_pending(&br->multicast_querier_timer))
+		goto out;
+
+	mdb = br->mdb;
+	mp = br_mdb_ip_get(mdb, group);
+	if (!mp)
+		goto out;
+
+	now = jiffies;
+	time = now + br->multicast_last_member_count *
+		     br->multicast_last_member_interval;
+
+	if (!port) {
+		if (!hlist_unhashed(&mp->mglist) &&
+		    (timer_pending(&mp->timer) ?
+		     time_after(mp->timer.expires, time) :
+		     try_to_del_timer_sync(&mp->timer) >= 0)) {
+			mod_timer(&mp->timer, time);
+
+			mp->queries_sent = 0;
+			mod_timer(&mp->query_timer, now);
+		}
+
+		goto out;
+	}
+
+	for (p = mp->ports; p; p = p->next) {
+		if (p->port != port)
+			continue;
+
+		if (!hlist_unhashed(&p->mglist) &&
+		    (timer_pending(&p->timer) ?
+		     time_after(p->timer.expires, time) :
+		     try_to_del_timer_sync(&p->timer) >= 0)) {
+			mod_timer(&p->timer, time);
+
+			p->queries_sent = 0;
+			mod_timer(&p->query_timer, now);
+		}
+
+		break;
+	}
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static int br_multicast_ipv4_rcv(struct net_bridge *br,
+				 struct net_bridge_port *port,
+				 struct sk_buff *skb)
+{
+	struct sk_buff *skb2 = skb;
+	struct iphdr *iph;
+	struct igmphdr *ih;
+	unsigned len;
+	unsigned offset;
+	int err;
+
+	BR_INPUT_SKB_CB(skb)->igmp = 0;
+	BR_INPUT_SKB_CB(skb)->mrouters_only = 0;
+
+	/* We treat OOM as packet loss for now. */
+	if (!pskb_may_pull(skb, sizeof(*iph)))
+		return -EINVAL;
+
+	iph = ip_hdr(skb);
+
+	if (iph->ihl < 5 || iph->version != 4)
+		return -EINVAL;
+
+	if (!pskb_may_pull(skb, ip_hdrlen(skb)))
+		return -EINVAL;
+
+	iph = ip_hdr(skb);
+
+	if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
+		return -EINVAL;
+
+	if (iph->protocol != IPPROTO_IGMP)
+		return 0;
+
+	len = ntohs(iph->tot_len);
+	if (skb->len < len || len < ip_hdrlen(skb))
+		return -EINVAL;
+
+	if (skb->len > len) {
+		skb2 = skb_clone(skb, GFP_ATOMIC);
+		if (!skb2)
+			return -ENOMEM;
+
+		err = pskb_trim_rcsum(skb2, len);
+		if (err)
+			return err;
+	}
+
+	len -= ip_hdrlen(skb2);
+	offset = skb_network_offset(skb2) + ip_hdrlen(skb2);
+	__skb_pull(skb2, offset);
+	skb_reset_transport_header(skb2);
+
+	err = -EINVAL;
+	if (!pskb_may_pull(skb2, sizeof(*ih)))
+		goto out;
+
+	iph = ip_hdr(skb2);
+
+	switch (skb2->ip_summed) {
+	case CHECKSUM_COMPLETE:
+		if (!csum_fold(skb2->csum))
+			break;
+		/* fall through */
+	case CHECKSUM_NONE:
+		skb2->csum = 0;
+		if (skb_checksum_complete(skb2))
+			return -EINVAL;
+	}
+
+	err = 0;
+
+	BR_INPUT_SKB_CB(skb)->igmp = 1;
+	ih = igmp_hdr(skb2);
+
+	switch (ih->type) {
+	case IGMP_HOST_MEMBERSHIP_REPORT:
+	case IGMPV2_HOST_MEMBERSHIP_REPORT:
+		BR_INPUT_SKB_CB(skb2)->mrouters_only = 1;
+		err = br_multicast_add_group(br, port, ih->group);
+		break;
+	case IGMPV3_HOST_MEMBERSHIP_REPORT:
+		err = br_multicast_igmp3_report(br, port, skb2);
+		break;
+	case IGMP_HOST_MEMBERSHIP_QUERY:
+		err = br_multicast_query(br, port, skb2);
+		break;
+	case IGMP_HOST_LEAVE_MESSAGE:
+		br_multicast_leave_group(br, port, ih->group);
+		break;
+	}
+
+out:
+	__skb_push(skb2, offset);
+	if (skb2 != skb)
+		kfree_skb(skb2);
+	return err;
+}
+
+int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
+		     struct sk_buff *skb)
+{
+	if (br->multicast_disabled)
+		return 0;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		return br_multicast_ipv4_rcv(br, port, skb);
+	}
+
+	return 0;
+}
+
+static void br_multicast_query_expired(unsigned long data)
+{
+	struct net_bridge *br = (void *)data;
+
+	spin_lock(&br->multicast_lock);
+	if (br->multicast_startup_queries_sent <
+	    br->multicast_startup_query_count)
+		br->multicast_startup_queries_sent++;
+
+	br_multicast_send_query(br, NULL, br->multicast_startup_queries_sent);
+
+	spin_unlock(&br->multicast_lock);
+}
+
+void br_multicast_init(struct net_bridge *br)
+{
+	br->hash_elasticity = 4;
+	br->hash_max = 512;
+
+	br->multicast_router = 1;
+	br->multicast_last_member_count = 2;
+	br->multicast_startup_query_count = 2;
+
+	br->multicast_last_member_interval = HZ;
+	br->multicast_query_response_interval = 10 * HZ;
+	br->multicast_startup_query_interval = 125 * HZ / 4;
+	br->multicast_query_interval = 125 * HZ;
+	br->multicast_querier_interval = 255 * HZ;
+	br->multicast_membership_interval = 260 * HZ;
+
+	spin_lock_init(&br->multicast_lock);
+	setup_timer(&br->multicast_router_timer,
+		    br_multicast_local_router_expired, 0);
+	setup_timer(&br->multicast_querier_timer,
+		    br_multicast_local_router_expired, 0);
+	setup_timer(&br->multicast_query_timer, br_multicast_query_expired,
+		    (unsigned long)br);
+}
+
+void br_multicast_open(struct net_bridge *br)
+{
+	br->multicast_startup_queries_sent = 0;
+
+	if (br->multicast_disabled)
+		return;
+
+	mod_timer(&br->multicast_query_timer, jiffies);
+}
+
+void br_multicast_stop(struct net_bridge *br)
+{
+	struct net_bridge_mdb_htable *mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *p, *n;
+	u32 ver;
+	int i;
+
+	del_timer_sync(&br->multicast_router_timer);
+	del_timer_sync(&br->multicast_querier_timer);
+	del_timer_sync(&br->multicast_query_timer);
+
+	spin_lock_bh(&br->multicast_lock);
+	mdb = br->mdb;
+	if (!mdb)
+		goto out;
+
+	br->mdb = NULL;
+
+	ver = mdb->ver;
+	for (i = 0; i < mdb->max; i++) {
+		hlist_for_each_entry_safe(mp, p, n, &mdb->mhash[i],
+					  hlist[ver]) {
+			del_timer(&mp->timer);
+			del_timer(&mp->query_timer);
+			call_rcu_bh(&mp->rcu, br_multicast_free_group);
+		}
+	}
+
+	if (mdb->old) {
+		spin_unlock_bh(&br->multicast_lock);
+		synchronize_rcu_bh();
+		spin_lock_bh(&br->multicast_lock);
+		WARN_ON(mdb->old);
+	}
+
+	mdb->old = mdb;
+	call_rcu_bh(&mdb->rcu, br_mdb_free);
+
+out:
+	spin_unlock_bh(&br->multicast_lock);
+}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 7b0aed5..0871775 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -57,6 +57,41 @@ struct net_bridge_fdb_entry
 	unsigned char			is_static;
 };
 
+struct net_bridge_port_group {
+	struct net_bridge_port		*port;
+	struct net_bridge_port_group	*next;
+	struct hlist_node		mglist;
+	struct rcu_head			rcu;
+	struct timer_list		timer;
+	struct timer_list		query_timer;
+	__be32				addr;
+	u32				queries_sent;
+};
+
+struct net_bridge_mdb_entry
+{
+	struct hlist_node		hlist[2];
+	struct hlist_node		mglist;
+	struct net_bridge		*br;
+	struct net_bridge_port_group	*ports;
+	struct rcu_head			rcu;
+	struct timer_list		timer;
+	struct timer_list		query_timer;
+	__be32				addr;
+	u32				queries_sent;
+};
+
+struct net_bridge_mdb_htable
+{
+	struct hlist_head		*mhash;
+	struct rcu_head			rcu;
+	struct net_bridge_mdb_htable	*old;
+	u32				size;
+	u32				max;
+	u32				secret;
+	u32				ver;
+};
+
 struct net_bridge_port
 {
 	struct net_bridge		*br;
@@ -84,6 +119,15 @@ struct net_bridge_port
 
 	unsigned long 			flags;
 #define BR_HAIRPIN_MODE		0x00000001
+
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	u32				multicast_startup_queries_sent;
+	unsigned char			multicast_router;
+	struct timer_list		multicast_router_timer;
+	struct timer_list		multicast_query_timer;
+	struct hlist_head		mglist;
+	struct hlist_node		rlist;
+#endif
 };
 
 struct net_bridge
@@ -125,6 +169,35 @@ struct net_bridge
 	unsigned char			topology_change;
 	unsigned char			topology_change_detected;
 
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	unsigned char			multicast_router;
+
+	u8				multicast_disabled:1;
+
+	u32				hash_elasticity;
+	u32				hash_max;
+
+	u32				multicast_last_member_count;
+	u32				multicast_startup_queries_sent;
+	u32				multicast_startup_query_count;
+
+	unsigned long			multicast_last_member_interval;
+	unsigned long			multicast_membership_interval;
+	unsigned long			multicast_querier_interval;
+	unsigned long			multicast_query_interval;
+	unsigned long			multicast_query_response_interval;
+	unsigned long			multicast_startup_query_interval;
+
+	spinlock_t			multicast_lock;
+	struct net_bridge_mdb_htable	*mdb;
+	struct hlist_head		router_list;
+	struct hlist_head		mglist;
+
+	struct timer_list		multicast_router_timer;
+	struct timer_list		multicast_querier_timer;
+	struct timer_list		multicast_query_timer;
+#endif
+
 	struct timer_list		hello_timer;
 	struct timer_list		tcn_timer;
 	struct timer_list		topology_change_timer;
@@ -134,6 +207,8 @@ struct net_bridge
 
 struct br_input_skb_cb {
 	struct net_device *brdev;
+	int igmp;
+	int mrouters_only;
 };
 
 #define BR_INPUT_SKB_CB(__skb)	((struct br_input_skb_cb *)(__skb)->cb)
@@ -205,6 +280,70 @@ extern struct sk_buff *br_handle_frame(struct net_bridge_port *p,
 extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
 extern int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *arg);
 
+/* br_multicast.c */
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+extern int br_multicast_rcv(struct net_bridge *br,
+			    struct net_bridge_port *port,
+			    struct sk_buff *skb);
+extern struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
+					       struct sk_buff *skb);
+extern void br_multicast_add_port(struct net_bridge_port *port);
+extern void br_multicast_del_port(struct net_bridge_port *port);
+extern void br_multicast_enable_port(struct net_bridge_port *port);
+extern void br_multicast_disable_port(struct net_bridge_port *port);
+extern void br_multicast_init(struct net_bridge *br);
+extern void br_multicast_open(struct net_bridge *br);
+extern void br_multicast_stop(struct net_bridge *br);
+#else
+static inline int br_multicast_rcv(struct net_bridge *br,
+				   struct net_bridge_port *port,
+				   struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
+						      struct sk_buff *skb)
+{
+	return NULL;
+}
+
+static inline void br_multicast_add_port(struct net_bridge_port *port)
+{
+}
+
+static inline void br_multicast_del_port(struct net_bridge_port *port)
+{
+}
+
+static inline void br_multicast_enable_port(struct net_bridge_port *port)
+{
+}
+
+static inline void br_multicast_disable_port(struct net_bridge_port *port)
+{
+}
+
+static inline void br_multicast_init(struct net_bridge *br)
+{
+}
+
+static inline void br_multicast_open(struct net_bridge *br)
+{
+}
+
+static inline void br_multicast_stop(struct net_bridge *br)
+{
+}
+#endif
+
+static inline bool br_multicast_is_router(struct net_bridge *br)
+{
+	return br->multicast_router == 2 ||
+	       (br->multicast_router == 1 &&
+		timer_pending(&br->multicast_router_timer));
+}
+
 /* br_netfilter.c */
 #ifdef CONFIG_BRIDGE_NETFILTER
 extern int br_netfilter_init(void);

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 7/13] bridge: Add multicast forwarding functions
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (5 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 8/13] bridge: Add multicast start/stop hooks Herbert Xu
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast forwarding functions

This patch adds code to perform selective multicast forwarding.

We forward multicast traffic to a set of ports plus all multicast
router ports.  In order to avoid duplications among these two
sets of ports, we order all ports by the numeric value of their
pointers.  The two lists are then walked in lock-step to eliminate
duplicates.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_forward.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h |   15 ++++++++++
 2 files changed, 82 insertions(+)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 86cd071..d61e6f7 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -186,3 +186,70 @@ void br_flood_forward(struct net_bridge *br, struct sk_buff *skb,
 {
 	br_flood(br, skb, skb2, __br_forward);
 }
+
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+/* called with rcu_read_lock */
+static void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
+			       struct sk_buff *skb, struct sk_buff *skb0,
+			       void (*__packet_hook)(
+					const struct net_bridge_port *p,
+					struct sk_buff *skb))
+{
+	struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+	struct net_bridge *br = netdev_priv(dev);
+	struct net_bridge_port *port;
+	struct net_bridge_port *lport, *rport;
+	struct net_bridge_port *prev;
+	struct net_bridge_port_group *p;
+	struct hlist_node *rp;
+
+	prev = NULL;
+
+	rp = br->router_list.first;
+	p = mdst ? mdst->ports : NULL;
+	while (p || rp) {
+		lport = p ? p->port : NULL;
+		rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) :
+			     NULL;
+
+		port = (unsigned long)lport > (unsigned long)rport ?
+		       lport : rport;
+
+		prev = maybe_deliver(prev, port, skb, __packet_hook);
+		if (IS_ERR(prev))
+			goto out;
+
+		if ((unsigned long)lport >= (unsigned long)port)
+			p = p->next;
+		if ((unsigned long)rport >= (unsigned long)port)
+			rp = rp->next;
+	}
+
+	if (!prev)
+		goto out;
+
+	if (skb0)
+		deliver_clone(prev, skb, __packet_hook);
+	else
+		__packet_hook(prev, skb);
+	return;
+
+out:
+	if (!skb0)
+		kfree_skb(skb);
+}
+
+/* called with rcu_read_lock */
+void br_multicast_deliver(struct net_bridge_mdb_entry *mdst,
+			  struct sk_buff *skb)
+{
+	br_multicast_flood(mdst, skb, NULL, __br_deliver);
+}
+
+/* called with rcu_read_lock */
+void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
+			  struct sk_buff *skb, struct sk_buff *skb2)
+{
+	br_multicast_flood(mdst, skb, skb2, __br_forward);
+}
+#endif
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 0871775..f2dd411 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -294,6 +294,10 @@ extern void br_multicast_disable_port(struct net_bridge_port *port);
 extern void br_multicast_init(struct net_bridge *br);
 extern void br_multicast_open(struct net_bridge *br);
 extern void br_multicast_stop(struct net_bridge *br);
+extern void br_multicast_deliver(struct net_bridge_mdb_entry *mdst,
+				 struct sk_buff *skb);
+extern void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
+				 struct sk_buff *skb, struct sk_buff *skb2);
 #else
 static inline int br_multicast_rcv(struct net_bridge *br,
 				   struct net_bridge_port *port,
@@ -335,6 +339,17 @@ static inline void br_multicast_open(struct net_bridge *br)
 static inline void br_multicast_stop(struct net_bridge *br)
 {
 }
+
+static inline void br_multicast_deliver(struct net_bridge_mdb_entry *mdst,
+					struct sk_buff *skb)
+{
+}
+
+static inline void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
+					struct sk_buff *skb,
+					struct sk_buff *skb2)
+{
+}
 #endif
 
 static inline bool br_multicast_is_router(struct net_bridge *br)

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 8/13] bridge: Add multicast start/stop hooks
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (6 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 7/13] bridge: Add multicast forwarding functions Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast start/stop hooks

This patch hooks up the bridge start/stop and add/delete/disable
port functions to the new multicast module.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c |    6 +++++-
 net/bridge/br_if.c     |    4 ++++
 net/bridge/br_stp.c    |    2 ++
 net/bridge/br_stp_if.c |    1 +
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index be35629..91dffe7 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -51,6 +51,7 @@ static int br_dev_open(struct net_device *dev)
 	br_features_recompute(br);
 	netif_start_queue(dev);
 	br_stp_enable_bridge(br);
+	br_multicast_open(br);
 
 	return 0;
 }
@@ -61,7 +62,10 @@ static void br_dev_set_multicast_list(struct net_device *dev)
 
 static int br_dev_stop(struct net_device *dev)
 {
-	br_stp_disable_bridge(netdev_priv(dev));
+	struct net_bridge *br = netdev_priv(dev);
+
+	br_stp_disable_bridge(br);
+	br_multicast_stop(br);
 
 	netif_stop_queue(dev);
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index a2cbe61..cc3cdfd 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -147,6 +147,8 @@ static void del_nbp(struct net_bridge_port *p)
 
 	rcu_assign_pointer(dev->br_port, NULL);
 
+	br_multicast_del_port(p);
+
 	kobject_uevent(&p->kobj, KOBJ_REMOVE);
 	kobject_del(&p->kobj);
 
@@ -209,6 +211,7 @@ static struct net_device *new_bridge_dev(struct net *net, const char *name)
 	INIT_LIST_HEAD(&br->age_list);
 
 	br_stp_timer_init(br);
+	br_multicast_init(br);
 
 	return dev;
 }
@@ -260,6 +263,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
 	br_init_port(p);
 	p->state = BR_STATE_DISABLED;
 	br_stp_port_timer_init(p);
+	br_multicast_add_port(p);
 
 	return p;
 }
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index fd3f8d6..edcf14b 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -386,6 +386,8 @@ static void br_make_forwarding(struct net_bridge_port *p)
 	else
 		p->state = BR_STATE_LEARNING;
 
+	br_multicast_enable_port(p);
+
 	br_log_state(p);
 
 	if (br->forward_delay != 0)
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 9a52ac5..d527119 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -108,6 +108,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
 	del_timer(&p->hold_timer);
 
 	br_fdb_delete_by_port(br, p, 0);
+	br_multicast_disable_port(p);
 
 	br_configuration_update(br);
 

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 9/13] bridge: Add multicast data-path hooks
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (7 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 8/13] bridge: Add multicast start/stop hooks Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast data-path hooks

This patch finally hooks up the multicast snooping module to the
data path.  In particular, all multicast packets passing through
the bridge are fed into the module and switched by it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c |   15 ++++++++++++---
 net/bridge/br_input.c  |   18 +++++++++++++++++-
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 91dffe7..eb7062d 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -25,6 +25,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct net_bridge *br = netdev_priv(dev);
 	const unsigned char *dest = skb->data;
 	struct net_bridge_fdb_entry *dst;
+	struct net_bridge_mdb_entry *mdst;
 
 	BR_INPUT_SKB_CB(skb)->brdev = dev;
 
@@ -34,13 +35,21 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_reset_mac_header(skb);
 	skb_pull(skb, ETH_HLEN);
 
-	if (dest[0] & 1)
-		br_flood_deliver(br, skb);
-	else if ((dst = __br_fdb_get(br, dest)) != NULL)
+	if (dest[0] & 1) {
+		if (br_multicast_rcv(br, NULL, skb))
+			goto out;
+
+		mdst = br_mdb_get(br, skb);
+		if (mdst || BR_INPUT_SKB_CB(skb)->mrouters_only)
+			br_multicast_deliver(mdst, skb);
+		else
+			br_flood_deliver(br, skb);
+	} else if ((dst = __br_fdb_get(br, dest)) != NULL)
 		br_deliver(dst->dst, skb);
 	else
 		br_flood_deliver(br, skb);
 
+out:
 	return NETDEV_TX_OK;
 }
 
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index edfdaef..53b3985 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -41,6 +41,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	struct net_bridge_port *p = rcu_dereference(skb->dev->br_port);
 	struct net_bridge *br;
 	struct net_bridge_fdb_entry *dst;
+	struct net_bridge_mdb_entry *mdst;
 	struct sk_buff *skb2;
 
 	if (!p || p->state == BR_STATE_DISABLED)
@@ -50,6 +51,10 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	br = p->br;
 	br_fdb_update(br, p, eth_hdr(skb)->h_source);
 
+	if (is_multicast_ether_addr(dest) &&
+	    br_multicast_rcv(br, p, skb))
+		goto drop;
+
 	if (p->state == BR_STATE_LEARNING)
 		goto drop;
 
@@ -64,8 +69,19 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	dst = NULL;
 
 	if (is_multicast_ether_addr(dest)) {
+		mdst = br_mdb_get(br, skb);
+		if (mdst || BR_INPUT_SKB_CB(skb)->mrouters_only) {
+			if ((mdst && !hlist_unhashed(&mdst->mglist)) ||
+			    br_multicast_is_router(br))
+				skb2 = skb;
+			br_multicast_forward(mdst, skb, skb2);
+			skb = NULL;
+			if (!skb2)
+				goto out;
+		} else
+			skb2 = skb;
+
 		br->dev->stats.multicast++;
-		skb2 = skb;
 	} else if ((dst = __br_fdb_get(br, dest)) && dst->is_local) {
 		skb2 = skb;
 		/* Do not forward the packet since it's local. */

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 10/13] bridge: Add multicast_router sysfs entries
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (8 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-27  0:42   ` Stephen Hemminger
  2010-02-26 15:35 ` [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle Herbert Xu
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast_router sysfs entries

This patch allows the user to forcibly enable/disable ports as
having multicast routers attached.  A port with a multicast router
will receive all multicast traffic.

The value 0 disables it completely.  The default is 1 which lets
the system automatically detect the presence of routers (currently
this is limited to picking up queries), and 2 means that the port
will always receive all multicast traffic.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_multicast.c |  105 +++++++++++++++++++++++++++++++++++++++-------
 net/bridge/br_private.h   |    3 +
 net/bridge/br_sysfs_br.c  |   21 +++++++++
 net/bridge/br_sysfs_if.c  |   18 +++++++
 4 files changed, 133 insertions(+), 14 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 746b5a6..674224b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -746,12 +746,30 @@ static int br_multicast_igmp3_report(struct net_bridge *br,
 	return err;
 }
 
+static void br_multicast_add_router(struct net_bridge *br,
+				    struct net_bridge_port *port)
+{
+	struct hlist_node *p;
+	struct hlist_node **h;
+
+	for (h = &br->router_list.first;
+	     (p = *h) &&
+	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
+	     (unsigned long)port;
+	     h = &p->next)
+		;
+
+	port->rlist.pprev = h;
+	port->rlist.next = p;
+	rcu_assign_pointer(*h, &port->rlist);
+	if (p)
+		p->pprev = &port->rlist.next;
+}
+
 static void br_multicast_mark_router(struct net_bridge *br,
 				     struct net_bridge_port *port)
 {
 	unsigned long now = jiffies;
-	struct hlist_node *p;
-	struct hlist_node **h;
 
 	if (!port) {
 		if (br->multicast_router == 1)
@@ -766,18 +784,7 @@ static void br_multicast_mark_router(struct net_bridge *br,
 	if (!hlist_unhashed(&port->rlist))
 		goto timer;
 
-	for (h = &br->router_list.first;
-	     (p = *h) &&
-	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
-	     (unsigned long)port;
-	     h = &p->next)
-		;
-
-	port->rlist.pprev = h;
-	port->rlist.next = p;
-	rcu_assign_pointer(*h, &port->rlist);
-	if (p)
-		p->pprev = &port->rlist.next;
+	br_multicast_add_router(br, port);
 
 timer:
 	mod_timer(&port->multicast_router_timer,
@@ -1133,3 +1140,73 @@ void br_multicast_stop(struct net_bridge *br)
 out:
 	spin_unlock_bh(&br->multicast_lock);
 }
+
+int br_multicast_set_router(struct net_bridge *br, unsigned long val)
+{
+	int err = -ENOENT;
+
+	spin_lock_bh(&br->multicast_lock);
+	if (!netif_running(br->dev))
+		goto unlock;
+
+	switch (val) {
+	case 0:
+	case 2:
+		del_timer(&br->multicast_router_timer);
+		/* fall through */
+	case 1:
+		br->multicast_router = val;
+		err = 0;
+		break;
+
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+unlock:
+	spin_unlock_bh(&br->multicast_lock);
+
+	return err;
+}
+
+int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)
+{
+	struct net_bridge *br = p->br;
+	int err = -ENOENT;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || p->state == BR_STATE_DISABLED)
+		goto unlock;
+
+	switch (val) {
+	case 0:
+	case 1:
+	case 2:
+		p->multicast_router = val;
+		err = 0;
+
+		if (val < 2 && !hlist_unhashed(&p->rlist))
+			hlist_del_init_rcu(&p->rlist);
+
+		if (val == 1)
+			break;
+
+		del_timer(&p->multicast_router_timer);
+
+		if (val == 0)
+			break;
+
+		br_multicast_add_router(br, p);
+		break;
+
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+unlock:
+	spin_unlock(&br->multicast_lock);
+
+	return err;
+}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f2dd411..2d3df82 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -298,6 +298,9 @@ extern void br_multicast_deliver(struct net_bridge_mdb_entry *mdst,
 				 struct sk_buff *skb);
 extern void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
 				 struct sk_buff *skb, struct sk_buff *skb2);
+extern int br_multicast_set_router(struct net_bridge *br, unsigned long val);
+extern int br_multicast_set_port_router(struct net_bridge_port *p,
+					unsigned long val);
 #else
 static inline int br_multicast_rcv(struct net_bridge *br,
 				   struct net_bridge_port *port,
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index bee4f30..cb74201 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -345,6 +345,24 @@ static ssize_t store_flush(struct device *d,
 }
 static DEVICE_ATTR(flush, S_IWUSR, NULL, store_flush);
 
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+static ssize_t show_multicast_router(struct device *d,
+				     struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%d\n", br->multicast_router);
+}
+
+static ssize_t store_multicast_router(struct device *d,
+				      struct device_attribute *attr,
+				      const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, br_multicast_set_router);
+}
+static DEVICE_ATTR(multicast_router, S_IRUGO | S_IWUSR, show_multicast_router,
+		   store_multicast_router);
+#endif
+
 static struct attribute *bridge_attrs[] = {
 	&dev_attr_forward_delay.attr,
 	&dev_attr_hello_time.attr,
@@ -364,6 +382,9 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_gc_timer.attr,
 	&dev_attr_group_addr.attr,
 	&dev_attr_flush.attr,
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	&dev_attr_multicast_router.attr,
+#endif
 	NULL
 };
 
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 820643a..696596c 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -159,6 +159,21 @@ static ssize_t store_hairpin_mode(struct net_bridge_port *p, unsigned long v)
 static BRPORT_ATTR(hairpin_mode, S_IRUGO | S_IWUSR,
 		   show_hairpin_mode, store_hairpin_mode);
 
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
+{
+	return sprintf(buf, "%d\n", p->multicast_router);
+}
+
+static ssize_t store_multicast_router(struct net_bridge_port *p,
+				      unsigned long v)
+{
+	return br_multicast_set_port_router(p, v);
+}
+static BRPORT_ATTR(multicast_router, S_IRUGO | S_IWUSR, show_multicast_router,
+		   store_multicast_router);
+#endif
+
 static struct brport_attribute *brport_attrs[] = {
 	&brport_attr_path_cost,
 	&brport_attr_priority,
@@ -176,6 +191,9 @@ static struct brport_attribute *brport_attrs[] = {
 	&brport_attr_hold_timer,
 	&brport_attr_flush,
 	&brport_attr_hairpin_mode,
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	&brport_attr_multicast_router,
+#endif
 	NULL
 };
 

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (9 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries Herbert Xu
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast_snooping sysfs toggle

This patch allows the user to disable IGMP snooping completely
through a sysfs toggle.  It also allows the user to reenable
snooping when it has been automatically disabled due to hash
collisions.  If the collisions have not been resolved however
the system will refuse to reenable snooping.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_multicast.c |   61 ++++++++++++++++++++++++++++++++++++++++++----
 net/bridge/br_private.h   |    1 
 net/bridge/br_sysfs_br.c  |   18 +++++++++++++
 3 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 674224b..c7a1095 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -656,6 +656,15 @@ void br_multicast_del_port(struct net_bridge_port *port)
 	del_timer_sync(&port->multicast_router_timer);
 }
 
+static void __br_multicast_enable_port(struct net_bridge_port *port)
+{
+	port->multicast_startup_queries_sent = 0;
+
+	if (try_to_del_timer_sync(&port->multicast_query_timer) >= 0 ||
+	    del_timer(&port->multicast_query_timer))
+		mod_timer(&port->multicast_query_timer, jiffies);
+}
+
 void br_multicast_enable_port(struct net_bridge_port *port)
 {
 	struct net_bridge *br = port->br;
@@ -664,11 +673,7 @@ void br_multicast_enable_port(struct net_bridge_port *port)
 	if (br->multicast_disabled || !netif_running(br->dev))
 		goto out;
 
-	port->multicast_startup_queries_sent = 0;
-
-	if (try_to_del_timer_sync(&port->multicast_query_timer) >= 0 ||
-	    del_timer(&port->multicast_query_timer))
-		mod_timer(&port->multicast_query_timer, jiffies);
+	__br_multicast_enable_port(port);
 
 out:
 	spin_unlock(&br->multicast_lock);
@@ -1210,3 +1215,49 @@ unlock:
 
 	return err;
 }
+
+int br_multicast_toggle(struct net_bridge *br, unsigned long val)
+{
+	struct net_bridge_port *port;
+	int err = -ENOENT;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev))
+		goto unlock;
+
+	err = 0;
+	if (br->multicast_disabled == !val)
+		goto unlock;
+
+	br->multicast_disabled = !val;
+	if (br->multicast_disabled)
+		goto unlock;
+
+	if (br->mdb) {
+		if (br->mdb->old) {
+			err = -EEXIST;
+rollback:
+			br->multicast_disabled = !!val;
+			goto unlock;
+		}
+
+		err = br_mdb_rehash(&br->mdb, br->mdb->max,
+				    br->hash_elasticity);
+		if (err)
+			goto rollback;
+	}
+
+	br_multicast_open(br);
+	list_for_each_entry(port, &br->port_list, list) {
+		if (port->state == BR_STATE_DISABLED ||
+		    port->state == BR_STATE_BLOCKING)
+			continue;
+
+		__br_multicast_enable_port(port);
+	}
+
+unlock:
+	spin_unlock(&br->multicast_lock);
+
+	return err;
+}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2d3df82..4467904 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -301,6 +301,7 @@ extern void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
 extern int br_multicast_set_router(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_port_router(struct net_bridge_port *p,
 					unsigned long val);
+extern int br_multicast_toggle(struct net_bridge *br, unsigned long val);
 #else
 static inline int br_multicast_rcv(struct net_bridge *br,
 				   struct net_bridge_port *port,
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index cb74201..0ab2883 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -361,6 +361,23 @@ static ssize_t store_multicast_router(struct device *d,
 }
 static DEVICE_ATTR(multicast_router, S_IRUGO | S_IWUSR, show_multicast_router,
 		   store_multicast_router);
+
+static ssize_t show_multicast_snooping(struct device *d,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%d\n", !br->multicast_disabled);
+}
+
+static ssize_t store_multicast_snooping(struct device *d,
+					struct device_attribute *attr,
+					const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, br_multicast_toggle);
+}
+static DEVICE_ATTR(multicast_snooping, S_IRUGO | S_IWUSR,
+		   show_multicast_snooping, store_multicast_snooping);
 #endif
 
 static struct attribute *bridge_attrs[] = {
@@ -384,6 +401,7 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_flush.attr,
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	&dev_attr_multicast_router.attr,
+	&dev_attr_multicast_snooping.attr,
 #endif
 	NULL
 };

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (10 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-26 15:35 ` [PATCH 13/13] bridge: Add multicast count/interval " Herbert Xu
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add hash elasticity/max sysfs entries

This patch allows the user to control the hash elasticity/max
parameters.  The elasticity setting does not take effect until
the next new multicast group is added.  At which point it is
checked and if after rehashing it still can't be satisfied then
snooping will be disabled.

The max setting on the other hand takes effect immediately.  It
must be a power of two and cannot be set to a value less than the
current number of multicast group entries.  This is the only way
to shrink the multicast hash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_multicast.c |   41 +++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h   |    1 +
 net/bridge/br_sysfs_br.c  |   39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index c7a1095..2559fb5 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -15,6 +15,7 @@
 #include <linux/igmp.h>
 #include <linux/jhash.h>
 #include <linux/kernel.h>
+#include <linux/log2.h>
 #include <linux/netdevice.h>
 #include <linux/netfilter_bridge.h>
 #include <linux/random.h>
@@ -1261,3 +1262,43 @@ unlock:
 
 	return err;
 }
+
+int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
+{
+	int err = -ENOENT;
+	u32 old;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev))
+		goto unlock;
+
+	err = -EINVAL;
+	if (!is_power_of_2(val))
+		goto unlock;
+	if (br->mdb && val < br->mdb->size)
+		goto unlock;
+
+	err = 0;
+
+	old = br->hash_max;
+	br->hash_max = val;
+
+	if (br->mdb) {
+		if (br->mdb->old) {
+			err = -EEXIST;
+rollback:
+			br->hash_max = old;
+			goto unlock;
+		}
+
+		err = br_mdb_rehash(&br->mdb, br->hash_max,
+				    br->hash_elasticity);
+		if (err)
+			goto rollback;
+	}
+
+unlock:
+	spin_unlock(&br->multicast_lock);
+
+	return err;
+}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 4467904..0f12a8f 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -302,6 +302,7 @@ extern int br_multicast_set_router(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_port_router(struct net_bridge_port *p,
 					unsigned long val);
 extern int br_multicast_toggle(struct net_bridge *br, unsigned long val);
+extern int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
 #else
 static inline int br_multicast_rcv(struct net_bridge *br,
 				   struct net_bridge_port *port,
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 0ab2883..d2ee53b 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -378,6 +378,43 @@ static ssize_t store_multicast_snooping(struct device *d,
 }
 static DEVICE_ATTR(multicast_snooping, S_IRUGO | S_IWUSR,
 		   show_multicast_snooping, store_multicast_snooping);
+
+static ssize_t show_hash_elasticity(struct device *d,
+				    struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%u\n", br->hash_elasticity);
+}
+
+static int set_elasticity(struct net_bridge *br, unsigned long val)
+{
+	br->hash_elasticity = val;
+	return 0;
+}
+
+static ssize_t store_hash_elasticity(struct device *d,
+				     struct device_attribute *attr,
+				     const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_elasticity);
+}
+static DEVICE_ATTR(hash_elasticity, S_IRUGO | S_IWUSR, show_hash_elasticity,
+		   store_hash_elasticity);
+
+static ssize_t show_hash_max(struct device *d, struct device_attribute *attr,
+			     char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%u\n", br->hash_max);
+}
+
+static ssize_t store_hash_max(struct device *d, struct device_attribute *attr,
+			      const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, br_multicast_set_hash_max);
+}
+static DEVICE_ATTR(hash_max, S_IRUGO | S_IWUSR, show_hash_max,
+		   store_hash_max);
 #endif
 
 static struct attribute *bridge_attrs[] = {
@@ -402,6 +439,8 @@ static struct attribute *bridge_attrs[] = {
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	&dev_attr_multicast_router.attr,
 	&dev_attr_multicast_snooping.attr,
+	&dev_attr_hash_elasticity.attr,
+	&dev_attr_hash_max.attr,
 #endif
 	NULL
 };

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 13/13] bridge: Add multicast count/interval sysfs entries
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (11 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries Herbert Xu
@ 2010-02-26 15:35 ` Herbert Xu
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-26 15:35 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast count/interval sysfs entries

This patch allows the user to the IGMP parameters related to the
snooping function of the bridge.  This includes various time
values and retransmission limits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_sysfs_br.c |  203 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 203 insertions(+)

diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index d2ee53b..dd321e3 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -415,6 +415,201 @@ static ssize_t store_hash_max(struct device *d, struct device_attribute *attr,
 }
 static DEVICE_ATTR(hash_max, S_IRUGO | S_IWUSR, show_hash_max,
 		   store_hash_max);
+
+static ssize_t show_multicast_last_member_count(struct device *d,
+						struct device_attribute *attr,
+						char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%u\n", br->multicast_last_member_count);
+}
+
+static int set_last_member_count(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_last_member_count = val;
+	return 0;
+}
+
+static ssize_t store_multicast_last_member_count(struct device *d,
+						 struct device_attribute *attr,
+						 const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_last_member_count);
+}
+static DEVICE_ATTR(multicast_last_member_count, S_IRUGO | S_IWUSR,
+		   show_multicast_last_member_count,
+		   store_multicast_last_member_count);
+
+static ssize_t show_multicast_startup_query_count(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%u\n", br->multicast_startup_query_count);
+}
+
+static int set_startup_query_count(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_startup_query_count = val;
+	return 0;
+}
+
+static ssize_t store_multicast_startup_query_count(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_startup_query_count);
+}
+static DEVICE_ATTR(multicast_startup_query_count, S_IRUGO | S_IWUSR,
+		   show_multicast_startup_query_count,
+		   store_multicast_startup_query_count);
+
+static ssize_t show_multicast_last_member_interval(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%lu\n",
+		       jiffies_to_clock_t(br->multicast_last_member_interval));
+}
+
+static int set_last_member_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_last_member_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_last_member_interval(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_last_member_interval);
+}
+static DEVICE_ATTR(multicast_last_member_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_last_member_interval,
+		   store_multicast_last_member_interval);
+
+static ssize_t show_multicast_membership_interval(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%lu\n",
+		       jiffies_to_clock_t(br->multicast_membership_interval));
+}
+
+static int set_membership_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_membership_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_membership_interval(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_membership_interval);
+}
+static DEVICE_ATTR(multicast_membership_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_membership_interval,
+		   store_multicast_membership_interval);
+
+static ssize_t show_multicast_querier_interval(struct device *d,
+					       struct device_attribute *attr,
+					       char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%lu\n",
+		       jiffies_to_clock_t(br->multicast_querier_interval));
+}
+
+static int set_querier_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_querier_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_querier_interval(struct device *d,
+						struct device_attribute *attr,
+						const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_querier_interval);
+}
+static DEVICE_ATTR(multicast_querier_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_querier_interval,
+		   store_multicast_querier_interval);
+
+static ssize_t show_multicast_query_interval(struct device *d,
+					     struct device_attribute *attr,
+					     char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%lu\n",
+		       jiffies_to_clock_t(br->multicast_query_interval));
+}
+
+static int set_query_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_query_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_query_interval(struct device *d,
+					      struct device_attribute *attr,
+					      const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_query_interval);
+}
+static DEVICE_ATTR(multicast_query_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_query_interval,
+		   store_multicast_query_interval);
+
+static ssize_t show_multicast_query_response_interval(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(
+		buf, "%lu\n",
+		jiffies_to_clock_t(br->multicast_query_response_interval));
+}
+
+static int set_query_response_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_query_response_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_query_response_interval(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_query_response_interval);
+}
+static DEVICE_ATTR(multicast_query_response_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_query_response_interval,
+		   store_multicast_query_response_interval);
+
+static ssize_t show_multicast_startup_query_interval(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(
+		buf, "%lu\n",
+		jiffies_to_clock_t(br->multicast_startup_query_interval));
+}
+
+static int set_startup_query_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_startup_query_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_startup_query_interval(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_startup_query_interval);
+}
+static DEVICE_ATTR(multicast_startup_query_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_startup_query_interval,
+		   store_multicast_startup_query_interval);
 #endif
 
 static struct attribute *bridge_attrs[] = {
@@ -441,6 +636,14 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_multicast_snooping.attr,
 	&dev_attr_hash_elasticity.attr,
 	&dev_attr_hash_max.attr,
+	&dev_attr_multicast_last_member_count.attr,
+	&dev_attr_multicast_startup_query_count.attr,
+	&dev_attr_multicast_last_member_interval.attr,
+	&dev_attr_multicast_membership_interval.attr,
+	&dev_attr_multicast_querier_interval.attr,
+	&dev_attr_multicast_query_interval.attr,
+	&dev_attr_multicast_query_response_interval.attr,
+	&dev_attr_multicast_startup_query_interval.attr,
 #endif
 	NULL
 };

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/13] bridge: Add multicast_router sysfs entries
  2010-02-26 15:35 ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
@ 2010-02-27  0:42   ` Stephen Hemminger
  2010-02-27 11:29     ` David Miller
  2010-03-09 12:25     ` Herbert Xu
  0 siblings, 2 replies; 81+ messages in thread
From: Stephen Hemminger @ 2010-02-27  0:42 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

On Fri, 26 Feb 2010 23:35:15 +0800
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> bridge: Add multicast_router sysfs entries
> 
> This patch allows the user to forcibly enable/disable ports as
> having multicast routers attached.  A port with a multicast router
> will receive all multicast traffic.
> 
> The value 0 disables it completely.  The default is 1 which lets
> the system automatically detect the presence of routers (currently
> this is limited to picking up queries), and 2 means that the port
> will always receive all multicast traffic.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


I like the functionality, but don't like users whacking on sysfs
directly. Could you send patches to integrate a user API into
bridge-utils; the utils are at: 
  git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/bridge-utils.git

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up
  2010-02-26 15:35 ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
@ 2010-02-27 11:14   ` David Miller
  2010-02-27 15:36     ` Herbert Xu
  0 siblings, 1 reply; 81+ messages in thread
From: David Miller @ 2010-02-27 11:14 UTC (permalink / raw)
  To: herbert; +Cc: netdev, shemminger

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 26 Feb 2010 23:35:07 +0800

> @@ -20,9 +20,9 @@
>  /* Bridge group multicast address 802.1d (pg 51). */
>  const u8 br_group_address[ETH_ALEN] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 };
>  
> -static void br_pass_frame_up(struct net_bridge *br, struct sk_buff *skb)
> +static int br_pass_frame_up(struct sk_buff *skb)
>  {
> -	struct net_device *indev, *brdev = br->dev;
> +	struct net_device *indev, *brdev = BR_INPUT_SKB_CB(skb)->brdev;
>  

You use this new BR_INPUT_SKB_CB() here in patch #2, but you only
start setting ->brdev it in patch #4.

This breaks things and makes your patch series non-bisectable.

Please fix, thanks.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/13] bridge: Add multicast_router sysfs entries
  2010-02-27  0:42   ` Stephen Hemminger
@ 2010-02-27 11:29     ` David Miller
  2010-02-27 15:53       ` Herbert Xu
  2010-03-09 12:25     ` Herbert Xu
  1 sibling, 1 reply; 81+ messages in thread
From: David Miller @ 2010-02-27 11:29 UTC (permalink / raw)
  To: shemminger; +Cc: herbert, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Fri, 26 Feb 2010 16:42:11 -0800

> I like the functionality, but don't like users whacking on sysfs
> directly. Could you send patches to integrate a user API into
> bridge-utils; the utils are at: 
>   git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/bridge-utils.git

Sounds reasonable to me.

Herbert, please do this after we've resolved the issues in your
patch set and integrated it.

Thanks!

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up
  2010-02-27 11:14   ` David Miller
@ 2010-02-27 15:36     ` Herbert Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-27 15:36 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, shemminger

On Sat, Feb 27, 2010 at 03:14:51AM -0800, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Fri, 26 Feb 2010 23:35:07 +0800
> 
> > @@ -20,9 +20,9 @@
> >  /* Bridge group multicast address 802.1d (pg 51). */
> >  const u8 br_group_address[ETH_ALEN] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 };
> >  
> > -static void br_pass_frame_up(struct net_bridge *br, struct sk_buff *skb)
> > +static int br_pass_frame_up(struct sk_buff *skb)
> >  {
> > -	struct net_device *indev, *brdev = br->dev;
> > +	struct net_device *indev, *brdev = BR_INPUT_SKB_CB(skb)->brdev;
> >  
> 
> You use this new BR_INPUT_SKB_CB() here in patch #2, but you only
> start setting ->brdev it in patch #4.

Actually this patch does work as is.  The brdev setting in #4 is
for the bridge device's local xmit path.  br_pass_frame_up is not
used on the local xmit path (since that would create a packet loop).

It's only used for packets originating from bridge ports, where
patch #2 ensures that BR_INPUT_SKB_CB is set correctly.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/13] bridge: Add multicast_router sysfs entries
  2010-02-27 11:29     ` David Miller
@ 2010-02-27 15:53       ` Herbert Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-27 15:53 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, netdev

On Sat, Feb 27, 2010 at 03:29:15AM -0800, David Miller wrote:
> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Fri, 26 Feb 2010 16:42:11 -0800
> 
> > I like the functionality, but don't like users whacking on sysfs
> > directly. Could you send patches to integrate a user API into
> > bridge-utils; the utils are at: 
> >   git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/bridge-utils.git
> 
> Sounds reasonable to me.
> 
> Herbert, please do this after we've resolved the issues in your
> patch set and integrated it.

Will do.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [1/13] bridge: Add IGMP snooping support
  2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
                   ` (12 preceding siblings ...)
  2010-02-26 15:35 ` [PATCH 13/13] bridge: Add multicast count/interval " Herbert Xu
@ 2010-02-28  5:40 ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
                     ` (13 more replies)
  13 siblings, 14 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:40 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

Hi Dave:

This is a repost of exactly the same series in order to get them
back into patchworks.  I hope I have resolved your concerns about
patch number 2.  Let me know if you still have any further questions.

This series of patches adds basic IGMP support to the bridge
device.  First of all the following is not currently supported
but may be added in future:

* IGMPv3 source support (so really just IGMPv2 for now)
* Non-querier router detection
* IPv6

The series is divided into two portions:

1-5 lays the ground work and can be merged without any of the
other patches.

6-13 are the actual IGMP-specific patches.

This is a kernel-only implementation.  In future we could move
parts of this into user-space just like RTP.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH 1/13] bridge: Do br_pass_frame_up after other ports
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Do br_pass_frame_up after other ports

At the moment we deliver to the local bridge port via the function
br_pass_frame_up before all other ports.  There is no requirement
for this.

For the purpose of IGMP snooping, it would be more convenient if
we did the local port last.  Therefore this patch rearranges the
bridge input processing so that the local bridge port gets to see
the packet last (if at all).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_input.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 5ee1a36..9589937 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -73,9 +73,6 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	if (skb2 == skb)
 		skb2 = skb_clone(skb, GFP_ATOMIC);
 
-	if (skb2)
-		br_pass_frame_up(br, skb2);
-
 	if (skb) {
 		if (dst)
 			br_forward(dst->dst, skb);
@@ -83,6 +80,9 @@ int br_handle_frame_finish(struct sk_buff *skb)
 			br_flood_forward(br, skb);
 	}
 
+	if (skb2)
+		br_pass_frame_up(br, skb2);
+
 out:
 	return 0;
 drop:

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
  2010-02-28  5:41   ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 3/13] bridge: Avoid unnecessary clone on forward path Herbert Xu
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Allow tail-call on br_pass_frame_up

This patch allows tail-call on the call to br_pass_frame_up
in br_handle_frame_finish.  This is now possible because of the
previous patch to call br_pass_frame_up last.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_input.c   |   12 +++++++-----
 net/bridge/br_private.h |    6 ++++++
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 9589937..be5ab8d 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -20,9 +20,9 @@
 /* Bridge group multicast address 802.1d (pg 51). */
 const u8 br_group_address[ETH_ALEN] = { 0x01, 0x80, 0xc2, 0x00, 0x00, 0x00 };
 
-static void br_pass_frame_up(struct net_bridge *br, struct sk_buff *skb)
+static int br_pass_frame_up(struct sk_buff *skb)
 {
-	struct net_device *indev, *brdev = br->dev;
+	struct net_device *indev, *brdev = BR_INPUT_SKB_CB(skb)->brdev;
 
 	brdev->stats.rx_packets++;
 	brdev->stats.rx_bytes += skb->len;
@@ -30,8 +30,8 @@ static void br_pass_frame_up(struct net_bridge *br, struct sk_buff *skb)
 	indev = skb->dev;
 	skb->dev = brdev;
 
-	NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL,
-		netif_receive_skb);
+	return NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, indev, NULL,
+		       netif_receive_skb);
 }
 
 /* note: already called with rcu_read_lock (preempt_disabled) */
@@ -53,6 +53,8 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	if (p->state == BR_STATE_LEARNING)
 		goto drop;
 
+	BR_INPUT_SKB_CB(skb)->brdev = br->dev;
+
 	/* The packet skb2 goes to the local host (NULL to skip). */
 	skb2 = NULL;
 
@@ -81,7 +83,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	}
 
 	if (skb2)
-		br_pass_frame_up(br, skb2);
+		return br_pass_frame_up(skb2);
 
 out:
 	return 0;
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2114e45..a38d738 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -132,6 +132,12 @@ struct net_bridge
 	struct kobject			*ifobj;
 };
 
+struct br_input_skb_cb {
+	struct net_device *brdev;
+};
+
+#define BR_INPUT_SKB_CB(__skb)	((struct br_input_skb_cb *)(__skb)->cb)
+
 extern struct notifier_block br_device_notifier;
 extern const u8 br_group_address[ETH_ALEN];
 

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 3/13] bridge: Avoid unnecessary clone on forward path
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
  2010-02-28  5:41   ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
  2010-02-28  5:41   ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path Herbert Xu
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Avoid unnecessary clone on forward path

When the packet is delivered to the local bridge device we may
end up cloning it unnecessarily if no bridge port can receive
the packet in br_flood.

This patch avoids this by moving the skb_clone into br_flood.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_forward.c |   33 ++++++++++++++++++++++-----------
 net/bridge/br_input.c   |    5 +----
 net/bridge/br_private.h |    3 ++-
 3 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index bc1704a..6cd50c6 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -105,8 +105,9 @@ void br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
 
 /* called under bridge lock */
 static void br_flood(struct net_bridge *br, struct sk_buff *skb,
-	void (*__packet_hook)(const struct net_bridge_port *p,
-			      struct sk_buff *skb))
+		     struct sk_buff *skb0,
+		     void (*__packet_hook)(const struct net_bridge_port *p,
+					   struct sk_buff *skb))
 {
 	struct net_bridge_port *p;
 	struct net_bridge_port *prev;
@@ -120,8 +121,7 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 
 				if ((skb2 = skb_clone(skb, GFP_ATOMIC)) == NULL) {
 					br->dev->stats.tx_dropped++;
-					kfree_skb(skb);
-					return;
+					goto out;
 				}
 
 				__packet_hook(prev, skb2);
@@ -131,23 +131,34 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 		}
 	}
 
-	if (prev != NULL) {
-		__packet_hook(prev, skb);
-		return;
+	if (!prev)
+		goto out;
+
+	if (skb0) {
+		skb = skb_clone(skb, GFP_ATOMIC);
+		if (!skb) {
+			br->dev->stats.tx_dropped++;
+			goto out;
+		}
 	}
+	__packet_hook(prev, skb);
+	return;
 
-	kfree_skb(skb);
+out:
+	if (!skb0)
+		kfree_skb(skb);
 }
 
 
 /* called with rcu_read_lock */
 void br_flood_deliver(struct net_bridge *br, struct sk_buff *skb)
 {
-	br_flood(br, skb, __br_deliver);
+	br_flood(br, skb, NULL, __br_deliver);
 }
 
 /* called under bridge lock */
-void br_flood_forward(struct net_bridge *br, struct sk_buff *skb)
+void br_flood_forward(struct net_bridge *br, struct sk_buff *skb,
+		      struct sk_buff *skb2)
 {
-	br_flood(br, skb, __br_forward);
+	br_flood(br, skb, skb2, __br_forward);
 }
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index be5ab8d..edfdaef 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -72,14 +72,11 @@ int br_handle_frame_finish(struct sk_buff *skb)
 		skb = NULL;
 	}
 
-	if (skb2 == skb)
-		skb2 = skb_clone(skb, GFP_ATOMIC);
-
 	if (skb) {
 		if (dst)
 			br_forward(dst->dst, skb);
 		else
-			br_flood_forward(br, skb);
+			br_flood_forward(br, skb, skb2);
 	}
 
 	if (skb2)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index a38d738..7b0aed5 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -181,7 +181,8 @@ extern void br_forward(const struct net_bridge_port *to,
 		struct sk_buff *skb);
 extern int br_forward_finish(struct sk_buff *skb);
 extern void br_flood_deliver(struct net_bridge *br, struct sk_buff *skb);
-extern void br_flood_forward(struct net_bridge *br, struct sk_buff *skb);
+extern void br_flood_forward(struct net_bridge *br, struct sk_buff *skb,
+			     struct sk_buff *skb2);
 
 /* br_if.c */
 extern void br_port_carrier_check(struct net_bridge_port *p);

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (2 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 3/13] bridge: Avoid unnecessary clone on forward path Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood Herbert Xu
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Use BR_INPUT_SKB_CB on xmit path

this patch makes BR_INPUT_SKB_CB available on the xmit path so
that we could avoid passing the br pointer around for the purpose
of collecting device statistics.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c  |    2 ++
 net/bridge/br_forward.c |    5 +++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 1a99c4e..be35629 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -26,6 +26,8 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	const unsigned char *dest = skb->data;
 	struct net_bridge_fdb_entry *dst;
 
+	BR_INPUT_SKB_CB(skb)->brdev = dev;
+
 	dev->stats.tx_packets++;
 	dev->stats.tx_bytes += skb->len;
 
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 6cd50c6..2e1cb43 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -111,6 +111,7 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 {
 	struct net_bridge_port *p;
 	struct net_bridge_port *prev;
+	struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
 
 	prev = NULL;
 
@@ -120,7 +121,7 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 				struct sk_buff *skb2;
 
 				if ((skb2 = skb_clone(skb, GFP_ATOMIC)) == NULL) {
-					br->dev->stats.tx_dropped++;
+					dev->stats.tx_dropped++;
 					goto out;
 				}
 
@@ -137,7 +138,7 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 	if (skb0) {
 		skb = skb_clone(skb, GFP_ATOMIC);
 		if (!skb) {
-			br->dev->stats.tx_dropped++;
+			dev->stats.tx_dropped++;
 			goto out;
 		}
 	}

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (3 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Split may_deliver/deliver_clone out of br_flood

This patch moves the main loop body in br_flood into the function
may_deliver.  The code that clones an skb and delivers it is moved
into the deliver_clone function.

This allows this to be reused by the future multicast forward
function.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_forward.c |   69 ++++++++++++++++++++++++++++++++----------------
 1 file changed, 46 insertions(+), 23 deletions(-)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 2e1cb43..86cd071 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -11,6 +11,7 @@
  *	2 of the License, or (at your option) any later version.
  */
 
+#include <linux/err.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>
@@ -103,6 +104,44 @@ void br_forward(const struct net_bridge_port *to, struct sk_buff *skb)
 	kfree_skb(skb);
 }
 
+static int deliver_clone(struct net_bridge_port *prev, struct sk_buff *skb,
+			 void (*__packet_hook)(const struct net_bridge_port *p,
+					       struct sk_buff *skb))
+{
+	skb = skb_clone(skb, GFP_ATOMIC);
+	if (!skb) {
+		struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+
+		dev->stats.tx_dropped++;
+		return -ENOMEM;
+	}
+
+	__packet_hook(prev, skb);
+	return 0;
+}
+
+static struct net_bridge_port *maybe_deliver(
+	struct net_bridge_port *prev, struct net_bridge_port *p,
+	struct sk_buff *skb,
+	void (*__packet_hook)(const struct net_bridge_port *p,
+			      struct sk_buff *skb))
+{
+	int err;
+
+	if (!should_deliver(p, skb))
+		return prev;
+
+	if (!prev)
+		goto out;
+
+	err = deliver_clone(prev, skb, __packet_hook);
+	if (err)
+		return ERR_PTR(err);
+
+out:
+	return p;
+}
+
 /* called under bridge lock */
 static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 		     struct sk_buff *skb0,
@@ -111,38 +150,22 @@ static void br_flood(struct net_bridge *br, struct sk_buff *skb,
 {
 	struct net_bridge_port *p;
 	struct net_bridge_port *prev;
-	struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
 
 	prev = NULL;
 
 	list_for_each_entry_rcu(p, &br->port_list, list) {
-		if (should_deliver(p, skb)) {
-			if (prev != NULL) {
-				struct sk_buff *skb2;
-
-				if ((skb2 = skb_clone(skb, GFP_ATOMIC)) == NULL) {
-					dev->stats.tx_dropped++;
-					goto out;
-				}
-
-				__packet_hook(prev, skb2);
-			}
-
-			prev = p;
-		}
+		prev = maybe_deliver(prev, p, skb, __packet_hook);
+		if (IS_ERR(prev))
+			goto out;
 	}
 
 	if (!prev)
 		goto out;
 
-	if (skb0) {
-		skb = skb_clone(skb, GFP_ATOMIC);
-		if (!skb) {
-			dev->stats.tx_dropped++;
-			goto out;
-		}
-	}
-	__packet_hook(prev, skb);
+	if (skb0)
+		deliver_clone(prev, skb, __packet_hook);
+	else
+		__packet_hook(prev, skb);
 	return;
 
 out:

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (4 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-03-05 23:43     ` Paul E. McKenney
  2010-02-28  5:41   ` [PATCH 7/13] bridge: Add multicast forwarding functions Herbert Xu
                     ` (7 subsequent siblings)
  13 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add core IGMP snooping support

This patch adds the core functionality of IGMP snooping support
without actually hooking it up.  So this patch should be a no-op
as far as the bridge's external behaviour is concerned.

All the new code and data is controlled by the Kconfig option
BRIDGE_IGMP_SNOOPING.  A run-time toggle is also available.

The multicast switching is done using an hash table that is
lockless on the read-side through RCU.  On the write-side the
new multicast_lock is used for all operations.  The hash table
supports dynamic growth/rehashing.

The hash table will be rehashed if any chain length exceeds a
preset limit.  If rehashing does not reduce the maximum chain
length then snooping will be disabled.

These features may be added in future (in no particular order):

* IGMPv3 source support
* Non-querier router detection
* IPv6

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/Kconfig        |   12 
 net/bridge/Makefile       |    2 
 net/bridge/br_multicast.c | 1135 ++++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h   |  139 +++++
 4 files changed, 1288 insertions(+)

diff --git a/net/bridge/Kconfig b/net/bridge/Kconfig
index e143ca6..78dd549 100644
--- a/net/bridge/Kconfig
+++ b/net/bridge/Kconfig
@@ -31,3 +31,15 @@ config BRIDGE
 	  will be called bridge.
 
 	  If unsure, say N.
+
+config BRIDGE_IGMP_SNOOPING
+	bool "IGMP snooping"
+	default y
+	---help---
+	  If you say Y here, then the Ethernet bridge will be able selectively
+	  forward multicast traffic based on IGMP traffic received from each
+	  port.
+
+	  Say N to exclude this support and reduce the binary size.
+
+	  If unsure, say Y.
diff --git a/net/bridge/Makefile b/net/bridge/Makefile
index f444c12..d0359ea 100644
--- a/net/bridge/Makefile
+++ b/net/bridge/Makefile
@@ -12,4 +12,6 @@ bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o
 
 bridge-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o
 
+bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o
+
 obj-$(CONFIG_BRIDGE_NF_EBTABLES) += netfilter/
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
new file mode 100644
index 0000000..746b5a6
--- /dev/null
+++ b/net/bridge/br_multicast.c
@@ -0,0 +1,1135 @@
+/*
+ * Bridge multicast support.
+ *
+ * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include <linux/err.h>
+#include <linux/if_ether.h>
+#include <linux/igmp.h>
+#include <linux/jhash.h>
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <linux/netfilter_bridge.h>
+#include <linux/random.h>
+#include <linux/rculist.h>
+#include <linux/skbuff.h>
+#include <linux/slab.h>
+#include <linux/timer.h>
+#include <net/ip.h>
+
+#include "br_private.h"
+
+static inline int br_ip_hash(struct net_bridge_mdb_htable *mdb, __be32 ip)
+{
+	return jhash_1word(mdb->secret, (u32)ip) & (mdb->max - 1);
+}
+
+static struct net_bridge_mdb_entry *__br_mdb_ip_get(
+	struct net_bridge_mdb_htable *mdb, __be32 dst, int hash)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *p;
+
+	hlist_for_each_entry(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
+		if (dst == mp->addr)
+			return mp;
+	}
+
+	return NULL;
+}
+
+static struct net_bridge_mdb_entry *br_mdb_ip_get(
+	struct net_bridge_mdb_htable *mdb, __be32 dst)
+{
+	return __br_mdb_ip_get(mdb, dst, br_ip_hash(mdb, dst));
+}
+
+struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
+					struct sk_buff *skb)
+{
+	struct net_bridge_mdb_htable *mdb = br->mdb;
+
+	if (!mdb || br->multicast_disabled)
+		return NULL;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if (BR_INPUT_SKB_CB(skb)->igmp)
+			break;
+		return br_mdb_ip_get(mdb, ip_hdr(skb)->daddr);
+	}
+
+	return NULL;
+}
+
+static void br_mdb_free(struct rcu_head *head)
+{
+	struct net_bridge_mdb_htable *mdb =
+		container_of(head, struct net_bridge_mdb_htable, rcu);
+	struct net_bridge_mdb_htable *old = mdb->old;
+
+	mdb->old = NULL;
+	kfree(old->mhash);
+	kfree(old);
+}
+
+static int br_mdb_copy(struct net_bridge_mdb_htable *new,
+		       struct net_bridge_mdb_htable *old,
+		       int elasticity)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *p;
+	int maxlen;
+	int len;
+	int i;
+
+	for (i = 0; i < old->max; i++)
+		hlist_for_each_entry(mp, p, &old->mhash[i], hlist[old->ver])
+			hlist_add_head(&mp->hlist[new->ver],
+				       &new->mhash[br_ip_hash(new, mp->addr)]);
+
+	if (!elasticity)
+		return 0;
+
+	maxlen = 0;
+	for (i = 0; i < new->max; i++) {
+		len = 0;
+		hlist_for_each_entry(mp, p, &new->mhash[i], hlist[new->ver])
+			len++;
+		if (len > maxlen)
+			maxlen = len;
+	}
+
+	return maxlen > elasticity ? -EINVAL : 0;
+}
+
+static void br_multicast_free_pg(struct rcu_head *head)
+{
+	struct net_bridge_port_group *p =
+		container_of(head, struct net_bridge_port_group, rcu);
+
+	kfree(p);
+}
+
+static void br_multicast_free_group(struct rcu_head *head)
+{
+	struct net_bridge_mdb_entry *mp =
+		container_of(head, struct net_bridge_mdb_entry, rcu);
+
+	kfree(mp);
+}
+
+static void br_multicast_group_expired(unsigned long data)
+{
+	struct net_bridge_mdb_entry *mp = (void *)data;
+	struct net_bridge *br = mp->br;
+	struct net_bridge_mdb_htable *mdb;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || timer_pending(&mp->timer))
+		goto out;
+
+	if (!hlist_unhashed(&mp->mglist))
+		hlist_del_init(&mp->mglist);
+
+	if (mp->ports)
+		goto out;
+
+	mdb = br->mdb;
+	hlist_del_rcu(&mp->hlist[mdb->ver]);
+	mdb->size--;
+
+	del_timer(&mp->query_timer);
+	call_rcu_bh(&mp->rcu, br_multicast_free_group);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static void br_multicast_del_pg(struct net_bridge *br,
+				struct net_bridge_port_group *pg)
+{
+	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group **pp;
+
+	mp = br_mdb_ip_get(mdb, pg->addr);
+	if (WARN_ON(!mp))
+		return;
+
+	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+		if (p != pg)
+			continue;
+
+		*pp = p->next;
+		hlist_del_init(&p->mglist);
+		del_timer(&p->timer);
+		del_timer(&p->query_timer);
+		call_rcu_bh(&p->rcu, br_multicast_free_pg);
+
+		if (!mp->ports && hlist_unhashed(&mp->mglist) &&
+		    netif_running(br->dev))
+			mod_timer(&mp->timer, jiffies);
+
+		return;
+	}
+
+	WARN_ON(1);
+}
+
+static void br_multicast_port_group_expired(unsigned long data)
+{
+	struct net_bridge_port_group *pg = (void *)data;
+	struct net_bridge *br = pg->port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || timer_pending(&pg->timer) ||
+	    hlist_unhashed(&pg->mglist))
+		goto out;
+
+	br_multicast_del_pg(br, pg);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static int br_mdb_rehash(struct net_bridge_mdb_htable **mdbp, int max,
+			 int elasticity)
+{
+	struct net_bridge_mdb_htable *old = *mdbp;
+	struct net_bridge_mdb_htable *mdb;
+	int err;
+
+	mdb = kmalloc(sizeof(*mdb), GFP_ATOMIC);
+	if (!mdb)
+		return -ENOMEM;
+
+	mdb->max = max;
+	mdb->old = old;
+
+	mdb->mhash = kzalloc(max * sizeof(*mdb->mhash), GFP_ATOMIC);
+	if (!mdb->mhash) {
+		kfree(mdb);
+		return -ENOMEM;
+	}
+
+	mdb->size = old ? old->size : 0;
+	mdb->ver = old ? old->ver ^ 1 : 0;
+
+	if (!old || elasticity)
+		get_random_bytes(&mdb->secret, sizeof(mdb->secret));
+	else
+		mdb->secret = old->secret;
+
+	if (!old)
+		goto out;
+
+	err = br_mdb_copy(mdb, old, elasticity);
+	if (err) {
+		kfree(mdb->mhash);
+		kfree(mdb);
+		return err;
+	}
+
+	call_rcu_bh(&mdb->rcu, br_mdb_free);
+
+out:
+	rcu_assign_pointer(*mdbp, mdb);
+
+	return 0;
+}
+
+static struct sk_buff *br_multicast_alloc_query(struct net_bridge *br,
+						__be32 group)
+{
+	struct sk_buff *skb;
+	struct igmphdr *ih;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+
+	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*iph) +
+						 sizeof(*ih) + 4);
+	if (!skb)
+		goto out;
+
+	skb->protocol = htons(ETH_P_IP);
+
+	skb_reset_mac_header(skb);
+	eth = eth_hdr(skb);
+
+	memcpy(eth->h_source, br->dev->dev_addr, 6);
+	eth->h_dest[0] = 1;
+	eth->h_dest[1] = 0;
+	eth->h_dest[2] = 0x5e;
+	eth->h_dest[3] = 0;
+	eth->h_dest[4] = 0;
+	eth->h_dest[5] = 1;
+	eth->h_proto = htons(ETH_P_IP);
+	skb_put(skb, sizeof(*eth));
+
+	skb_set_network_header(skb, skb->len);
+	iph = ip_hdr(skb);
+
+	iph->version = 4;
+	iph->ihl = 6;
+	iph->tos = 0xc0;
+	iph->tot_len = htons(sizeof(*iph) + sizeof(*ih) + 4);
+	iph->id = 0;
+	iph->frag_off = htons(IP_DF);
+	iph->ttl = 1;
+	iph->protocol = IPPROTO_IGMP;
+	iph->saddr = 0;
+	iph->daddr = htonl(INADDR_ALLHOSTS_GROUP);
+	((u8 *)&iph[1])[0] = IPOPT_RA;
+	((u8 *)&iph[1])[1] = 4;
+	((u8 *)&iph[1])[2] = 0;
+	((u8 *)&iph[1])[3] = 0;
+	ip_send_check(iph);
+	skb_put(skb, 24);
+
+	skb_set_transport_header(skb, skb->len);
+	ih = igmp_hdr(skb);
+	ih->type = IGMP_HOST_MEMBERSHIP_QUERY;
+	ih->code = (group ? br->multicast_last_member_interval :
+			    br->multicast_query_response_interval) /
+		   (HZ / IGMP_TIMER_SCALE);
+	ih->group = group;
+	ih->csum = 0;
+	ih->csum = ip_compute_csum((void *)ih, sizeof(struct igmphdr));
+	skb_put(skb, sizeof(*ih));
+
+	__skb_pull(skb, sizeof(*eth));
+
+out:
+	return skb;
+}
+
+static void br_multicast_send_group_query(struct net_bridge_mdb_entry *mp)
+{
+	struct net_bridge *br = mp->br;
+	struct sk_buff *skb;
+
+	skb = br_multicast_alloc_query(br, mp->addr);
+	if (!skb)
+		goto timer;
+
+	netif_rx(skb);
+
+timer:
+	if (++mp->queries_sent < br->multicast_last_member_count)
+		mod_timer(&mp->query_timer,
+			  jiffies + br->multicast_last_member_interval);
+}
+
+static void br_multicast_group_query_expired(unsigned long data)
+{
+	struct net_bridge_mdb_entry *mp = (void *)data;
+	struct net_bridge *br = mp->br;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || hlist_unhashed(&mp->mglist) ||
+	    mp->queries_sent >= br->multicast_last_member_count)
+		goto out;
+
+	br_multicast_send_group_query(mp);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static void br_multicast_send_port_group_query(struct net_bridge_port_group *pg)
+{
+	struct net_bridge_port *port = pg->port;
+	struct net_bridge *br = port->br;
+	struct sk_buff *skb;
+
+	skb = br_multicast_alloc_query(br, pg->addr);
+	if (!skb)
+		goto timer;
+
+	br_deliver(port, skb);
+
+timer:
+	if (++pg->queries_sent < br->multicast_last_member_count)
+		mod_timer(&pg->query_timer,
+			  jiffies + br->multicast_last_member_interval);
+}
+
+static void br_multicast_port_group_query_expired(unsigned long data)
+{
+	struct net_bridge_port_group *pg = (void *)data;
+	struct net_bridge_port *port = pg->port;
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || hlist_unhashed(&pg->mglist) ||
+	    pg->queries_sent >= br->multicast_last_member_count)
+		goto out;
+
+	br_multicast_send_port_group_query(pg);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static struct net_bridge_mdb_entry *br_multicast_get_group(
+	struct net_bridge *br, struct net_bridge_port *port, __be32 group,
+	int hash)
+{
+	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *p;
+	unsigned count = 0;
+	unsigned max;
+	int elasticity;
+	int err;
+
+	hlist_for_each_entry(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
+		count++;
+		if (unlikely(group == mp->addr)) {
+			return mp;
+		}
+	}
+
+	elasticity = 0;
+	max = mdb->max;
+
+	if (unlikely(count > br->hash_elasticity && count)) {
+		if (net_ratelimit())
+			printk(KERN_INFO "%s: Multicast hash table "
+			       "chain limit reached: %s\n",
+			       br->dev->name, port ? port->dev->name :
+						     br->dev->name);
+
+		elasticity = br->hash_elasticity;
+	}
+
+	if (mdb->size >= max) {
+		max *= 2;
+		if (unlikely(max >= br->hash_max)) {
+			printk(KERN_WARNING "%s: Multicast hash table maximum "
+			       "reached, disabling snooping: %s, %d\n",
+			       br->dev->name, port ? port->dev->name :
+						     br->dev->name,
+			       max);
+			err = -E2BIG;
+disable:
+			br->multicast_disabled = 1;
+			goto err;
+		}
+	}
+
+	if (max > mdb->max || elasticity) {
+		if (mdb->old) {
+			if (net_ratelimit())
+				printk(KERN_INFO "%s: Multicast hash table "
+				       "on fire: %s\n",
+				       br->dev->name, port ? port->dev->name :
+							     br->dev->name);
+			err = -EEXIST;
+			goto err;
+		}
+
+		err = br_mdb_rehash(&br->mdb, max, elasticity);
+		if (err) {
+			printk(KERN_WARNING "%s: Cannot rehash multicast "
+			       "hash table, disabling snooping: "
+			       "%s, %d, %d\n",
+			       br->dev->name, port ? port->dev->name :
+						     br->dev->name,
+			       mdb->size, err);
+			goto disable;
+		}
+
+		err = -EAGAIN;
+		goto err;
+	}
+
+	return NULL;
+
+err:
+	mp = ERR_PTR(err);
+	return mp;
+}
+
+static struct net_bridge_mdb_entry *br_multicast_new_group(
+	struct net_bridge *br, struct net_bridge_port *port, __be32 group)
+{
+	struct net_bridge_mdb_htable *mdb = br->mdb;
+	struct net_bridge_mdb_entry *mp;
+	int hash;
+
+	if (!mdb) {
+		if (br_mdb_rehash(&br->mdb, BR_HASH_SIZE, 0))
+			return NULL;
+		goto rehash;
+	}
+
+	hash = br_ip_hash(mdb, group);
+	mp = br_multicast_get_group(br, port, group, hash);
+	switch (PTR_ERR(mp)) {
+	case 0:
+		break;
+
+	case -EAGAIN:
+rehash:
+		mdb = br->mdb;
+		hash = br_ip_hash(mdb, group);
+		break;
+
+	default:
+		goto out;
+	}
+
+	mp = kzalloc(sizeof(*mp), GFP_ATOMIC);
+	if (unlikely(!mp))
+		goto out;
+
+	mp->br = br;
+	mp->addr = group;
+	setup_timer(&mp->timer, br_multicast_group_expired,
+		    (unsigned long)mp);
+	setup_timer(&mp->query_timer, br_multicast_group_query_expired,
+		    (unsigned long)mp);
+
+	hlist_add_head_rcu(&mp->hlist[mdb->ver], &mdb->mhash[hash]);
+	mdb->size++;
+
+out:
+	return mp;
+}
+
+static int br_multicast_add_group(struct net_bridge *br,
+				  struct net_bridge_port *port, __be32 group)
+{
+	struct net_bridge_mdb_entry *mp;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group **pp;
+	unsigned long now = jiffies;
+	int err;
+
+	if (ipv4_is_local_multicast(group))
+		return 0;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) ||
+	    (port && port->state == BR_STATE_DISABLED))
+		goto out;
+
+	mp = br_multicast_new_group(br, port, group);
+	err = PTR_ERR(mp);
+	if (unlikely(IS_ERR(mp) || !mp))
+		goto err;
+
+	if (!port) {
+		hlist_add_head(&mp->mglist, &br->mglist);
+		mod_timer(&mp->timer, now + br->multicast_membership_interval);
+		goto out;
+	}
+
+	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+		if (p->port == port)
+			goto found;
+		if ((unsigned long)p->port < (unsigned long)port)
+			break;
+	}
+
+	p = kzalloc(sizeof(*p), GFP_ATOMIC);
+	err = -ENOMEM;
+	if (unlikely(!p))
+		goto err;
+
+	p->addr = group;
+	p->port = port;
+	p->next = *pp;
+	hlist_add_head(&p->mglist, &port->mglist);
+	setup_timer(&p->timer, br_multicast_port_group_expired,
+		    (unsigned long)p);
+	setup_timer(&p->query_timer, br_multicast_port_group_query_expired,
+		    (unsigned long)p);
+
+	rcu_assign_pointer(*pp, p);
+
+found:
+	mod_timer(&p->timer, now + br->multicast_membership_interval);
+out:
+	err = 0;
+
+err:
+	spin_unlock(&br->multicast_lock);
+	return err;
+}
+
+static void br_multicast_router_expired(unsigned long data)
+{
+	struct net_bridge_port *port = (void *)data;
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (port->multicast_router != 1 ||
+	    timer_pending(&port->multicast_router_timer) ||
+	    hlist_unhashed(&port->rlist))
+		goto out;
+
+	hlist_del_init_rcu(&port->rlist);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static void br_multicast_local_router_expired(unsigned long data)
+{
+}
+
+static void br_multicast_send_query(struct net_bridge *br,
+				    struct net_bridge_port *port, u32 sent)
+{
+	unsigned long time;
+	struct sk_buff *skb;
+
+	if (!netif_running(br->dev) || br->multicast_disabled ||
+	    timer_pending(&br->multicast_querier_timer))
+		return;
+
+	skb = br_multicast_alloc_query(br, 0);
+	if (!skb)
+		goto timer;
+
+	if (port) {
+		__skb_push(skb, sizeof(struct ethhdr));
+		skb->dev = port->dev;
+		NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
+			dev_queue_xmit);
+	} else
+		netif_rx(skb);
+
+timer:
+	time = jiffies;
+	time += sent < br->multicast_startup_query_count ?
+		br->multicast_startup_query_interval :
+		br->multicast_query_interval;
+	mod_timer(port ? &port->multicast_query_timer :
+			 &br->multicast_query_timer, time);
+}
+
+static void br_multicast_port_query_expired(unsigned long data)
+{
+	struct net_bridge_port *port = (void *)data;
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (port && (port->state == BR_STATE_DISABLED ||
+		     port->state == BR_STATE_BLOCKING))
+		goto out;
+
+	if (port->multicast_startup_queries_sent <
+	    br->multicast_startup_query_count)
+		port->multicast_startup_queries_sent++;
+
+	br_multicast_send_query(port->br, port,
+				port->multicast_startup_queries_sent);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+void br_multicast_add_port(struct net_bridge_port *port)
+{
+	port->multicast_router = 1;
+
+	setup_timer(&port->multicast_router_timer, br_multicast_router_expired,
+		    (unsigned long)port);
+	setup_timer(&port->multicast_query_timer,
+		    br_multicast_port_query_expired, (unsigned long)port);
+}
+
+void br_multicast_del_port(struct net_bridge_port *port)
+{
+	del_timer_sync(&port->multicast_router_timer);
+}
+
+void br_multicast_enable_port(struct net_bridge_port *port)
+{
+	struct net_bridge *br = port->br;
+
+	spin_lock(&br->multicast_lock);
+	if (br->multicast_disabled || !netif_running(br->dev))
+		goto out;
+
+	port->multicast_startup_queries_sent = 0;
+
+	if (try_to_del_timer_sync(&port->multicast_query_timer) >= 0 ||
+	    del_timer(&port->multicast_query_timer))
+		mod_timer(&port->multicast_query_timer, jiffies);
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+void br_multicast_disable_port(struct net_bridge_port *port)
+{
+	struct net_bridge *br = port->br;
+	struct net_bridge_port_group *pg;
+	struct hlist_node *p, *n;
+
+	spin_lock(&br->multicast_lock);
+	hlist_for_each_entry_safe(pg, p, n, &port->mglist, mglist)
+		br_multicast_del_pg(br, pg);
+
+	if (!hlist_unhashed(&port->rlist))
+		hlist_del_init_rcu(&port->rlist);
+	del_timer(&port->multicast_router_timer);
+	del_timer(&port->multicast_query_timer);
+	spin_unlock(&br->multicast_lock);
+}
+
+static int br_multicast_igmp3_report(struct net_bridge *br,
+				     struct net_bridge_port *port,
+				     struct sk_buff *skb)
+{
+	struct igmpv3_report *ih;
+	struct igmpv3_grec *grec;
+	int i;
+	int len;
+	int num;
+	int type;
+	int err = 0;
+	__be32 group;
+
+	if (!pskb_may_pull(skb, sizeof(*ih)))
+		return -EINVAL;
+
+	ih = igmpv3_report_hdr(skb);
+	num = ntohs(ih->ngrec);
+	len = sizeof(*ih);
+
+	for (i = 0; i < num; i++) {
+		len += sizeof(*grec);
+		if (!pskb_may_pull(skb, len))
+			return -EINVAL;
+
+		grec = (void *)(skb->data + len);
+		group = grec->grec_mca;
+		type = grec->grec_type;
+
+		len += grec->grec_nsrcs * 4;
+		if (!pskb_may_pull(skb, len))
+			return -EINVAL;
+
+		/* We treat this as an IGMPv2 report for now. */
+		switch (type) {
+		case IGMPV3_MODE_IS_INCLUDE:
+		case IGMPV3_MODE_IS_EXCLUDE:
+		case IGMPV3_CHANGE_TO_INCLUDE:
+		case IGMPV3_CHANGE_TO_EXCLUDE:
+		case IGMPV3_ALLOW_NEW_SOURCES:
+		case IGMPV3_BLOCK_OLD_SOURCES:
+			break;
+
+		default:
+			continue;
+		}
+
+		err = br_multicast_add_group(br, port, group);
+		if (err)
+			break;
+	}
+
+	return err;
+}
+
+static void br_multicast_mark_router(struct net_bridge *br,
+				     struct net_bridge_port *port)
+{
+	unsigned long now = jiffies;
+	struct hlist_node *p;
+	struct hlist_node **h;
+
+	if (!port) {
+		if (br->multicast_router == 1)
+			mod_timer(&br->multicast_router_timer,
+				  now + br->multicast_querier_interval);
+		return;
+	}
+
+	if (port->multicast_router != 1)
+		return;
+
+	if (!hlist_unhashed(&port->rlist))
+		goto timer;
+
+	for (h = &br->router_list.first;
+	     (p = *h) &&
+	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
+	     (unsigned long)port;
+	     h = &p->next)
+		;
+
+	port->rlist.pprev = h;
+	port->rlist.next = p;
+	rcu_assign_pointer(*h, &port->rlist);
+	if (p)
+		p->pprev = &port->rlist.next;
+
+timer:
+	mod_timer(&port->multicast_router_timer,
+		  now + br->multicast_querier_interval);
+}
+
+static void br_multicast_query_received(struct net_bridge *br,
+					struct net_bridge_port *port,
+					__be32 saddr)
+{
+	if (saddr)
+		mod_timer(&br->multicast_querier_timer,
+			  jiffies + br->multicast_querier_interval);
+	else if (timer_pending(&br->multicast_querier_timer))
+		return;
+
+	br_multicast_mark_router(br, port);
+}
+
+static int br_multicast_query(struct net_bridge *br,
+			      struct net_bridge_port *port,
+			      struct sk_buff *skb)
+{
+	struct iphdr *iph = ip_hdr(skb);
+	struct igmphdr *ih = igmp_hdr(skb);
+	struct net_bridge_mdb_entry *mp;
+	struct igmpv3_query *ih3;
+	struct net_bridge_port_group *p;
+	struct net_bridge_port_group **pp;
+	unsigned long max_delay;
+	unsigned long now = jiffies;
+	__be32 group;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) ||
+	    (port && port->state == BR_STATE_DISABLED))
+		goto out;
+
+	br_multicast_query_received(br, port, iph->saddr);
+
+	group = ih->group;
+
+	if (skb->len == sizeof(*ih)) {
+		max_delay = ih->code * (HZ / IGMP_TIMER_SCALE);
+
+		if (!max_delay) {
+			max_delay = 10 * HZ;
+			group = 0;
+		}
+	} else {
+		if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)))
+			return -EINVAL;
+
+		ih3 = igmpv3_query_hdr(skb);
+		if (ih3->nsrcs)
+			return 0;
+
+		max_delay = ih3->code ? 1 :
+			    IGMPV3_MRC(ih3->code) * (HZ / IGMP_TIMER_SCALE);
+	}
+
+	if (!group)
+		goto out;
+
+	mp = br_mdb_ip_get(br->mdb, group);
+	if (!mp)
+		goto out;
+
+	max_delay *= br->multicast_last_member_count;
+
+	if (!hlist_unhashed(&mp->mglist) &&
+	    (timer_pending(&mp->timer) ?
+	     time_after(mp->timer.expires, now + max_delay) :
+	     try_to_del_timer_sync(&mp->timer) >= 0))
+		mod_timer(&mp->timer, now + max_delay);
+
+	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
+		if (timer_pending(&p->timer) ?
+		    time_after(p->timer.expires, now + max_delay) :
+		    try_to_del_timer_sync(&p->timer) >= 0)
+			mod_timer(&mp->timer, now + max_delay);
+	}
+
+out:
+	spin_unlock(&br->multicast_lock);
+	return 0;
+}
+
+static void br_multicast_leave_group(struct net_bridge *br,
+				     struct net_bridge_port *port,
+				     __be32 group)
+{
+	struct net_bridge_mdb_htable *mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct net_bridge_port_group *p;
+	unsigned long now;
+	unsigned long time;
+
+	if (ipv4_is_local_multicast(group))
+		return;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) ||
+	    (port && port->state == BR_STATE_DISABLED) ||
+	    timer_pending(&br->multicast_querier_timer))
+		goto out;
+
+	mdb = br->mdb;
+	mp = br_mdb_ip_get(mdb, group);
+	if (!mp)
+		goto out;
+
+	now = jiffies;
+	time = now + br->multicast_last_member_count *
+		     br->multicast_last_member_interval;
+
+	if (!port) {
+		if (!hlist_unhashed(&mp->mglist) &&
+		    (timer_pending(&mp->timer) ?
+		     time_after(mp->timer.expires, time) :
+		     try_to_del_timer_sync(&mp->timer) >= 0)) {
+			mod_timer(&mp->timer, time);
+
+			mp->queries_sent = 0;
+			mod_timer(&mp->query_timer, now);
+		}
+
+		goto out;
+	}
+
+	for (p = mp->ports; p; p = p->next) {
+		if (p->port != port)
+			continue;
+
+		if (!hlist_unhashed(&p->mglist) &&
+		    (timer_pending(&p->timer) ?
+		     time_after(p->timer.expires, time) :
+		     try_to_del_timer_sync(&p->timer) >= 0)) {
+			mod_timer(&p->timer, time);
+
+			p->queries_sent = 0;
+			mod_timer(&p->query_timer, now);
+		}
+
+		break;
+	}
+
+out:
+	spin_unlock(&br->multicast_lock);
+}
+
+static int br_multicast_ipv4_rcv(struct net_bridge *br,
+				 struct net_bridge_port *port,
+				 struct sk_buff *skb)
+{
+	struct sk_buff *skb2 = skb;
+	struct iphdr *iph;
+	struct igmphdr *ih;
+	unsigned len;
+	unsigned offset;
+	int err;
+
+	BR_INPUT_SKB_CB(skb)->igmp = 0;
+	BR_INPUT_SKB_CB(skb)->mrouters_only = 0;
+
+	/* We treat OOM as packet loss for now. */
+	if (!pskb_may_pull(skb, sizeof(*iph)))
+		return -EINVAL;
+
+	iph = ip_hdr(skb);
+
+	if (iph->ihl < 5 || iph->version != 4)
+		return -EINVAL;
+
+	if (!pskb_may_pull(skb, ip_hdrlen(skb)))
+		return -EINVAL;
+
+	iph = ip_hdr(skb);
+
+	if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
+		return -EINVAL;
+
+	if (iph->protocol != IPPROTO_IGMP)
+		return 0;
+
+	len = ntohs(iph->tot_len);
+	if (skb->len < len || len < ip_hdrlen(skb))
+		return -EINVAL;
+
+	if (skb->len > len) {
+		skb2 = skb_clone(skb, GFP_ATOMIC);
+		if (!skb2)
+			return -ENOMEM;
+
+		err = pskb_trim_rcsum(skb2, len);
+		if (err)
+			return err;
+	}
+
+	len -= ip_hdrlen(skb2);
+	offset = skb_network_offset(skb2) + ip_hdrlen(skb2);
+	__skb_pull(skb2, offset);
+	skb_reset_transport_header(skb2);
+
+	err = -EINVAL;
+	if (!pskb_may_pull(skb2, sizeof(*ih)))
+		goto out;
+
+	iph = ip_hdr(skb2);
+
+	switch (skb2->ip_summed) {
+	case CHECKSUM_COMPLETE:
+		if (!csum_fold(skb2->csum))
+			break;
+		/* fall through */
+	case CHECKSUM_NONE:
+		skb2->csum = 0;
+		if (skb_checksum_complete(skb2))
+			return -EINVAL;
+	}
+
+	err = 0;
+
+	BR_INPUT_SKB_CB(skb)->igmp = 1;
+	ih = igmp_hdr(skb2);
+
+	switch (ih->type) {
+	case IGMP_HOST_MEMBERSHIP_REPORT:
+	case IGMPV2_HOST_MEMBERSHIP_REPORT:
+		BR_INPUT_SKB_CB(skb2)->mrouters_only = 1;
+		err = br_multicast_add_group(br, port, ih->group);
+		break;
+	case IGMPV3_HOST_MEMBERSHIP_REPORT:
+		err = br_multicast_igmp3_report(br, port, skb2);
+		break;
+	case IGMP_HOST_MEMBERSHIP_QUERY:
+		err = br_multicast_query(br, port, skb2);
+		break;
+	case IGMP_HOST_LEAVE_MESSAGE:
+		br_multicast_leave_group(br, port, ih->group);
+		break;
+	}
+
+out:
+	__skb_push(skb2, offset);
+	if (skb2 != skb)
+		kfree_skb(skb2);
+	return err;
+}
+
+int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
+		     struct sk_buff *skb)
+{
+	if (br->multicast_disabled)
+		return 0;
+
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		return br_multicast_ipv4_rcv(br, port, skb);
+	}
+
+	return 0;
+}
+
+static void br_multicast_query_expired(unsigned long data)
+{
+	struct net_bridge *br = (void *)data;
+
+	spin_lock(&br->multicast_lock);
+	if (br->multicast_startup_queries_sent <
+	    br->multicast_startup_query_count)
+		br->multicast_startup_queries_sent++;
+
+	br_multicast_send_query(br, NULL, br->multicast_startup_queries_sent);
+
+	spin_unlock(&br->multicast_lock);
+}
+
+void br_multicast_init(struct net_bridge *br)
+{
+	br->hash_elasticity = 4;
+	br->hash_max = 512;
+
+	br->multicast_router = 1;
+	br->multicast_last_member_count = 2;
+	br->multicast_startup_query_count = 2;
+
+	br->multicast_last_member_interval = HZ;
+	br->multicast_query_response_interval = 10 * HZ;
+	br->multicast_startup_query_interval = 125 * HZ / 4;
+	br->multicast_query_interval = 125 * HZ;
+	br->multicast_querier_interval = 255 * HZ;
+	br->multicast_membership_interval = 260 * HZ;
+
+	spin_lock_init(&br->multicast_lock);
+	setup_timer(&br->multicast_router_timer,
+		    br_multicast_local_router_expired, 0);
+	setup_timer(&br->multicast_querier_timer,
+		    br_multicast_local_router_expired, 0);
+	setup_timer(&br->multicast_query_timer, br_multicast_query_expired,
+		    (unsigned long)br);
+}
+
+void br_multicast_open(struct net_bridge *br)
+{
+	br->multicast_startup_queries_sent = 0;
+
+	if (br->multicast_disabled)
+		return;
+
+	mod_timer(&br->multicast_query_timer, jiffies);
+}
+
+void br_multicast_stop(struct net_bridge *br)
+{
+	struct net_bridge_mdb_htable *mdb;
+	struct net_bridge_mdb_entry *mp;
+	struct hlist_node *p, *n;
+	u32 ver;
+	int i;
+
+	del_timer_sync(&br->multicast_router_timer);
+	del_timer_sync(&br->multicast_querier_timer);
+	del_timer_sync(&br->multicast_query_timer);
+
+	spin_lock_bh(&br->multicast_lock);
+	mdb = br->mdb;
+	if (!mdb)
+		goto out;
+
+	br->mdb = NULL;
+
+	ver = mdb->ver;
+	for (i = 0; i < mdb->max; i++) {
+		hlist_for_each_entry_safe(mp, p, n, &mdb->mhash[i],
+					  hlist[ver]) {
+			del_timer(&mp->timer);
+			del_timer(&mp->query_timer);
+			call_rcu_bh(&mp->rcu, br_multicast_free_group);
+		}
+	}
+
+	if (mdb->old) {
+		spin_unlock_bh(&br->multicast_lock);
+		synchronize_rcu_bh();
+		spin_lock_bh(&br->multicast_lock);
+		WARN_ON(mdb->old);
+	}
+
+	mdb->old = mdb;
+	call_rcu_bh(&mdb->rcu, br_mdb_free);
+
+out:
+	spin_unlock_bh(&br->multicast_lock);
+}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 7b0aed5..0871775 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -57,6 +57,41 @@ struct net_bridge_fdb_entry
 	unsigned char			is_static;
 };
 
+struct net_bridge_port_group {
+	struct net_bridge_port		*port;
+	struct net_bridge_port_group	*next;
+	struct hlist_node		mglist;
+	struct rcu_head			rcu;
+	struct timer_list		timer;
+	struct timer_list		query_timer;
+	__be32				addr;
+	u32				queries_sent;
+};
+
+struct net_bridge_mdb_entry
+{
+	struct hlist_node		hlist[2];
+	struct hlist_node		mglist;
+	struct net_bridge		*br;
+	struct net_bridge_port_group	*ports;
+	struct rcu_head			rcu;
+	struct timer_list		timer;
+	struct timer_list		query_timer;
+	__be32				addr;
+	u32				queries_sent;
+};
+
+struct net_bridge_mdb_htable
+{
+	struct hlist_head		*mhash;
+	struct rcu_head			rcu;
+	struct net_bridge_mdb_htable	*old;
+	u32				size;
+	u32				max;
+	u32				secret;
+	u32				ver;
+};
+
 struct net_bridge_port
 {
 	struct net_bridge		*br;
@@ -84,6 +119,15 @@ struct net_bridge_port
 
 	unsigned long 			flags;
 #define BR_HAIRPIN_MODE		0x00000001
+
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	u32				multicast_startup_queries_sent;
+	unsigned char			multicast_router;
+	struct timer_list		multicast_router_timer;
+	struct timer_list		multicast_query_timer;
+	struct hlist_head		mglist;
+	struct hlist_node		rlist;
+#endif
 };
 
 struct net_bridge
@@ -125,6 +169,35 @@ struct net_bridge
 	unsigned char			topology_change;
 	unsigned char			topology_change_detected;
 
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	unsigned char			multicast_router;
+
+	u8				multicast_disabled:1;
+
+	u32				hash_elasticity;
+	u32				hash_max;
+
+	u32				multicast_last_member_count;
+	u32				multicast_startup_queries_sent;
+	u32				multicast_startup_query_count;
+
+	unsigned long			multicast_last_member_interval;
+	unsigned long			multicast_membership_interval;
+	unsigned long			multicast_querier_interval;
+	unsigned long			multicast_query_interval;
+	unsigned long			multicast_query_response_interval;
+	unsigned long			multicast_startup_query_interval;
+
+	spinlock_t			multicast_lock;
+	struct net_bridge_mdb_htable	*mdb;
+	struct hlist_head		router_list;
+	struct hlist_head		mglist;
+
+	struct timer_list		multicast_router_timer;
+	struct timer_list		multicast_querier_timer;
+	struct timer_list		multicast_query_timer;
+#endif
+
 	struct timer_list		hello_timer;
 	struct timer_list		tcn_timer;
 	struct timer_list		topology_change_timer;
@@ -134,6 +207,8 @@ struct net_bridge
 
 struct br_input_skb_cb {
 	struct net_device *brdev;
+	int igmp;
+	int mrouters_only;
 };
 
 #define BR_INPUT_SKB_CB(__skb)	((struct br_input_skb_cb *)(__skb)->cb)
@@ -205,6 +280,70 @@ extern struct sk_buff *br_handle_frame(struct net_bridge_port *p,
 extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
 extern int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *arg);
 
+/* br_multicast.c */
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+extern int br_multicast_rcv(struct net_bridge *br,
+			    struct net_bridge_port *port,
+			    struct sk_buff *skb);
+extern struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
+					       struct sk_buff *skb);
+extern void br_multicast_add_port(struct net_bridge_port *port);
+extern void br_multicast_del_port(struct net_bridge_port *port);
+extern void br_multicast_enable_port(struct net_bridge_port *port);
+extern void br_multicast_disable_port(struct net_bridge_port *port);
+extern void br_multicast_init(struct net_bridge *br);
+extern void br_multicast_open(struct net_bridge *br);
+extern void br_multicast_stop(struct net_bridge *br);
+#else
+static inline int br_multicast_rcv(struct net_bridge *br,
+				   struct net_bridge_port *port,
+				   struct sk_buff *skb)
+{
+	return 0;
+}
+
+static inline struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
+						      struct sk_buff *skb)
+{
+	return NULL;
+}
+
+static inline void br_multicast_add_port(struct net_bridge_port *port)
+{
+}
+
+static inline void br_multicast_del_port(struct net_bridge_port *port)
+{
+}
+
+static inline void br_multicast_enable_port(struct net_bridge_port *port)
+{
+}
+
+static inline void br_multicast_disable_port(struct net_bridge_port *port)
+{
+}
+
+static inline void br_multicast_init(struct net_bridge *br)
+{
+}
+
+static inline void br_multicast_open(struct net_bridge *br)
+{
+}
+
+static inline void br_multicast_stop(struct net_bridge *br)
+{
+}
+#endif
+
+static inline bool br_multicast_is_router(struct net_bridge *br)
+{
+	return br->multicast_router == 2 ||
+	       (br->multicast_router == 1 &&
+		timer_pending(&br->multicast_router_timer));
+}
+
 /* br_netfilter.c */
 #ifdef CONFIG_BRIDGE_NETFILTER
 extern int br_netfilter_init(void);

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 7/13] bridge: Add multicast forwarding functions
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (5 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 8/13] bridge: Add multicast start/stop hooks Herbert Xu
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast forwarding functions

This patch adds code to perform selective multicast forwarding.

We forward multicast traffic to a set of ports plus all multicast
router ports.  In order to avoid duplications among these two
sets of ports, we order all ports by the numeric value of their
pointers.  The two lists are then walked in lock-step to eliminate
duplicates.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_forward.c |   67 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h |   15 ++++++++++
 2 files changed, 82 insertions(+)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 86cd071..d61e6f7 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -186,3 +186,70 @@ void br_flood_forward(struct net_bridge *br, struct sk_buff *skb,
 {
 	br_flood(br, skb, skb2, __br_forward);
 }
+
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+/* called with rcu_read_lock */
+static void br_multicast_flood(struct net_bridge_mdb_entry *mdst,
+			       struct sk_buff *skb, struct sk_buff *skb0,
+			       void (*__packet_hook)(
+					const struct net_bridge_port *p,
+					struct sk_buff *skb))
+{
+	struct net_device *dev = BR_INPUT_SKB_CB(skb)->brdev;
+	struct net_bridge *br = netdev_priv(dev);
+	struct net_bridge_port *port;
+	struct net_bridge_port *lport, *rport;
+	struct net_bridge_port *prev;
+	struct net_bridge_port_group *p;
+	struct hlist_node *rp;
+
+	prev = NULL;
+
+	rp = br->router_list.first;
+	p = mdst ? mdst->ports : NULL;
+	while (p || rp) {
+		lport = p ? p->port : NULL;
+		rport = rp ? hlist_entry(rp, struct net_bridge_port, rlist) :
+			     NULL;
+
+		port = (unsigned long)lport > (unsigned long)rport ?
+		       lport : rport;
+
+		prev = maybe_deliver(prev, port, skb, __packet_hook);
+		if (IS_ERR(prev))
+			goto out;
+
+		if ((unsigned long)lport >= (unsigned long)port)
+			p = p->next;
+		if ((unsigned long)rport >= (unsigned long)port)
+			rp = rp->next;
+	}
+
+	if (!prev)
+		goto out;
+
+	if (skb0)
+		deliver_clone(prev, skb, __packet_hook);
+	else
+		__packet_hook(prev, skb);
+	return;
+
+out:
+	if (!skb0)
+		kfree_skb(skb);
+}
+
+/* called with rcu_read_lock */
+void br_multicast_deliver(struct net_bridge_mdb_entry *mdst,
+			  struct sk_buff *skb)
+{
+	br_multicast_flood(mdst, skb, NULL, __br_deliver);
+}
+
+/* called with rcu_read_lock */
+void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
+			  struct sk_buff *skb, struct sk_buff *skb2)
+{
+	br_multicast_flood(mdst, skb, skb2, __br_forward);
+}
+#endif
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 0871775..f2dd411 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -294,6 +294,10 @@ extern void br_multicast_disable_port(struct net_bridge_port *port);
 extern void br_multicast_init(struct net_bridge *br);
 extern void br_multicast_open(struct net_bridge *br);
 extern void br_multicast_stop(struct net_bridge *br);
+extern void br_multicast_deliver(struct net_bridge_mdb_entry *mdst,
+				 struct sk_buff *skb);
+extern void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
+				 struct sk_buff *skb, struct sk_buff *skb2);
 #else
 static inline int br_multicast_rcv(struct net_bridge *br,
 				   struct net_bridge_port *port,
@@ -335,6 +339,17 @@ static inline void br_multicast_open(struct net_bridge *br)
 static inline void br_multicast_stop(struct net_bridge *br)
 {
 }
+
+static inline void br_multicast_deliver(struct net_bridge_mdb_entry *mdst,
+					struct sk_buff *skb)
+{
+}
+
+static inline void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
+					struct sk_buff *skb,
+					struct sk_buff *skb2)
+{
+}
 #endif
 
 static inline bool br_multicast_is_router(struct net_bridge *br)

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 8/13] bridge: Add multicast start/stop hooks
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (6 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 7/13] bridge: Add multicast forwarding functions Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast start/stop hooks

This patch hooks up the bridge start/stop and add/delete/disable
port functions to the new multicast module.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c |    6 +++++-
 net/bridge/br_if.c     |    4 ++++
 net/bridge/br_stp.c    |    2 ++
 net/bridge/br_stp_if.c |    1 +
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index be35629..91dffe7 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -51,6 +51,7 @@ static int br_dev_open(struct net_device *dev)
 	br_features_recompute(br);
 	netif_start_queue(dev);
 	br_stp_enable_bridge(br);
+	br_multicast_open(br);
 
 	return 0;
 }
@@ -61,7 +62,10 @@ static void br_dev_set_multicast_list(struct net_device *dev)
 
 static int br_dev_stop(struct net_device *dev)
 {
-	br_stp_disable_bridge(netdev_priv(dev));
+	struct net_bridge *br = netdev_priv(dev);
+
+	br_stp_disable_bridge(br);
+	br_multicast_stop(br);
 
 	netif_stop_queue(dev);
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index a2cbe61..cc3cdfd 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -147,6 +147,8 @@ static void del_nbp(struct net_bridge_port *p)
 
 	rcu_assign_pointer(dev->br_port, NULL);
 
+	br_multicast_del_port(p);
+
 	kobject_uevent(&p->kobj, KOBJ_REMOVE);
 	kobject_del(&p->kobj);
 
@@ -209,6 +211,7 @@ static struct net_device *new_bridge_dev(struct net *net, const char *name)
 	INIT_LIST_HEAD(&br->age_list);
 
 	br_stp_timer_init(br);
+	br_multicast_init(br);
 
 	return dev;
 }
@@ -260,6 +263,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
 	br_init_port(p);
 	p->state = BR_STATE_DISABLED;
 	br_stp_port_timer_init(p);
+	br_multicast_add_port(p);
 
 	return p;
 }
diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
index fd3f8d6..edcf14b 100644
--- a/net/bridge/br_stp.c
+++ b/net/bridge/br_stp.c
@@ -386,6 +386,8 @@ static void br_make_forwarding(struct net_bridge_port *p)
 	else
 		p->state = BR_STATE_LEARNING;
 
+	br_multicast_enable_port(p);
+
 	br_log_state(p);
 
 	if (br->forward_delay != 0)
diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c
index 9a52ac5..d527119 100644
--- a/net/bridge/br_stp_if.c
+++ b/net/bridge/br_stp_if.c
@@ -108,6 +108,7 @@ void br_stp_disable_port(struct net_bridge_port *p)
 	del_timer(&p->hold_timer);
 
 	br_fdb_delete_by_port(br, p, 0);
+	br_multicast_disable_port(p);
 
 	br_configuration_update(br);
 

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 9/13] bridge: Add multicast data-path hooks
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (7 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 8/13] bridge: Add multicast start/stop hooks Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-04-27 17:13     ` [PATCH net-next] bridge: use is_multicast_ether_addr Stephen Hemminger
  2010-02-28  5:41   ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
                     ` (4 subsequent siblings)
  13 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast data-path hooks

This patch finally hooks up the multicast snooping module to the
data path.  In particular, all multicast packets passing through
the bridge are fed into the module and switched by it.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_device.c |   15 ++++++++++++---
 net/bridge/br_input.c  |   18 +++++++++++++++++-
 2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/net/bridge/br_device.c b/net/bridge/br_device.c
index 91dffe7..eb7062d 100644
--- a/net/bridge/br_device.c
+++ b/net/bridge/br_device.c
@@ -25,6 +25,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct net_bridge *br = netdev_priv(dev);
 	const unsigned char *dest = skb->data;
 	struct net_bridge_fdb_entry *dst;
+	struct net_bridge_mdb_entry *mdst;
 
 	BR_INPUT_SKB_CB(skb)->brdev = dev;
 
@@ -34,13 +35,21 @@ netdev_tx_t br_dev_xmit(struct sk_buff *skb, struct net_device *dev)
 	skb_reset_mac_header(skb);
 	skb_pull(skb, ETH_HLEN);
 
-	if (dest[0] & 1)
-		br_flood_deliver(br, skb);
-	else if ((dst = __br_fdb_get(br, dest)) != NULL)
+	if (dest[0] & 1) {
+		if (br_multicast_rcv(br, NULL, skb))
+			goto out;
+
+		mdst = br_mdb_get(br, skb);
+		if (mdst || BR_INPUT_SKB_CB(skb)->mrouters_only)
+			br_multicast_deliver(mdst, skb);
+		else
+			br_flood_deliver(br, skb);
+	} else if ((dst = __br_fdb_get(br, dest)) != NULL)
 		br_deliver(dst->dst, skb);
 	else
 		br_flood_deliver(br, skb);
 
+out:
 	return NETDEV_TX_OK;
 }
 
diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index edfdaef..53b3985 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -41,6 +41,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	struct net_bridge_port *p = rcu_dereference(skb->dev->br_port);
 	struct net_bridge *br;
 	struct net_bridge_fdb_entry *dst;
+	struct net_bridge_mdb_entry *mdst;
 	struct sk_buff *skb2;
 
 	if (!p || p->state == BR_STATE_DISABLED)
@@ -50,6 +51,10 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	br = p->br;
 	br_fdb_update(br, p, eth_hdr(skb)->h_source);
 
+	if (is_multicast_ether_addr(dest) &&
+	    br_multicast_rcv(br, p, skb))
+		goto drop;
+
 	if (p->state == BR_STATE_LEARNING)
 		goto drop;
 
@@ -64,8 +69,19 @@ int br_handle_frame_finish(struct sk_buff *skb)
 	dst = NULL;
 
 	if (is_multicast_ether_addr(dest)) {
+		mdst = br_mdb_get(br, skb);
+		if (mdst || BR_INPUT_SKB_CB(skb)->mrouters_only) {
+			if ((mdst && !hlist_unhashed(&mdst->mglist)) ||
+			    br_multicast_is_router(br))
+				skb2 = skb;
+			br_multicast_forward(mdst, skb, skb2);
+			skb = NULL;
+			if (!skb2)
+				goto out;
+		} else
+			skb2 = skb;
+
 		br->dev->stats.multicast++;
-		skb2 = skb;
 	} else if ((dst = __br_fdb_get(br, dest)) && dst->is_local) {
 		skb2 = skb;
 		/* Do not forward the packet since it's local. */

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 10/13] bridge: Add multicast_router sysfs entries
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (8 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-04-27 17:13     ` [PATCH net-next] bridge: multicast router list manipulation Stephen Hemminger
  2010-02-28  5:41   ` [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle Herbert Xu
                     ` (3 subsequent siblings)
  13 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast_router sysfs entries

This patch allows the user to forcibly enable/disable ports as
having multicast routers attached.  A port with a multicast router
will receive all multicast traffic.

The value 0 disables it completely.  The default is 1 which lets
the system automatically detect the presence of routers (currently
this is limited to picking up queries), and 2 means that the port
will always receive all multicast traffic.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_multicast.c |  105 +++++++++++++++++++++++++++++++++++++++-------
 net/bridge/br_private.h   |    3 +
 net/bridge/br_sysfs_br.c  |   21 +++++++++
 net/bridge/br_sysfs_if.c  |   18 +++++++
 4 files changed, 133 insertions(+), 14 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 746b5a6..674224b 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -746,12 +746,30 @@ static int br_multicast_igmp3_report(struct net_bridge *br,
 	return err;
 }
 
+static void br_multicast_add_router(struct net_bridge *br,
+				    struct net_bridge_port *port)
+{
+	struct hlist_node *p;
+	struct hlist_node **h;
+
+	for (h = &br->router_list.first;
+	     (p = *h) &&
+	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
+	     (unsigned long)port;
+	     h = &p->next)
+		;
+
+	port->rlist.pprev = h;
+	port->rlist.next = p;
+	rcu_assign_pointer(*h, &port->rlist);
+	if (p)
+		p->pprev = &port->rlist.next;
+}
+
 static void br_multicast_mark_router(struct net_bridge *br,
 				     struct net_bridge_port *port)
 {
 	unsigned long now = jiffies;
-	struct hlist_node *p;
-	struct hlist_node **h;
 
 	if (!port) {
 		if (br->multicast_router == 1)
@@ -766,18 +784,7 @@ static void br_multicast_mark_router(struct net_bridge *br,
 	if (!hlist_unhashed(&port->rlist))
 		goto timer;
 
-	for (h = &br->router_list.first;
-	     (p = *h) &&
-	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
-	     (unsigned long)port;
-	     h = &p->next)
-		;
-
-	port->rlist.pprev = h;
-	port->rlist.next = p;
-	rcu_assign_pointer(*h, &port->rlist);
-	if (p)
-		p->pprev = &port->rlist.next;
+	br_multicast_add_router(br, port);
 
 timer:
 	mod_timer(&port->multicast_router_timer,
@@ -1133,3 +1140,73 @@ void br_multicast_stop(struct net_bridge *br)
 out:
 	spin_unlock_bh(&br->multicast_lock);
 }
+
+int br_multicast_set_router(struct net_bridge *br, unsigned long val)
+{
+	int err = -ENOENT;
+
+	spin_lock_bh(&br->multicast_lock);
+	if (!netif_running(br->dev))
+		goto unlock;
+
+	switch (val) {
+	case 0:
+	case 2:
+		del_timer(&br->multicast_router_timer);
+		/* fall through */
+	case 1:
+		br->multicast_router = val;
+		err = 0;
+		break;
+
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+unlock:
+	spin_unlock_bh(&br->multicast_lock);
+
+	return err;
+}
+
+int br_multicast_set_port_router(struct net_bridge_port *p, unsigned long val)
+{
+	struct net_bridge *br = p->br;
+	int err = -ENOENT;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev) || p->state == BR_STATE_DISABLED)
+		goto unlock;
+
+	switch (val) {
+	case 0:
+	case 1:
+	case 2:
+		p->multicast_router = val;
+		err = 0;
+
+		if (val < 2 && !hlist_unhashed(&p->rlist))
+			hlist_del_init_rcu(&p->rlist);
+
+		if (val == 1)
+			break;
+
+		del_timer(&p->multicast_router_timer);
+
+		if (val == 0)
+			break;
+
+		br_multicast_add_router(br, p);
+		break;
+
+	default:
+		err = -EINVAL;
+		break;
+	}
+
+unlock:
+	spin_unlock(&br->multicast_lock);
+
+	return err;
+}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index f2dd411..2d3df82 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -298,6 +298,9 @@ extern void br_multicast_deliver(struct net_bridge_mdb_entry *mdst,
 				 struct sk_buff *skb);
 extern void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
 				 struct sk_buff *skb, struct sk_buff *skb2);
+extern int br_multicast_set_router(struct net_bridge *br, unsigned long val);
+extern int br_multicast_set_port_router(struct net_bridge_port *p,
+					unsigned long val);
 #else
 static inline int br_multicast_rcv(struct net_bridge *br,
 				   struct net_bridge_port *port,
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index bee4f30..cb74201 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -345,6 +345,24 @@ static ssize_t store_flush(struct device *d,
 }
 static DEVICE_ATTR(flush, S_IWUSR, NULL, store_flush);
 
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+static ssize_t show_multicast_router(struct device *d,
+				     struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%d\n", br->multicast_router);
+}
+
+static ssize_t store_multicast_router(struct device *d,
+				      struct device_attribute *attr,
+				      const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, br_multicast_set_router);
+}
+static DEVICE_ATTR(multicast_router, S_IRUGO | S_IWUSR, show_multicast_router,
+		   store_multicast_router);
+#endif
+
 static struct attribute *bridge_attrs[] = {
 	&dev_attr_forward_delay.attr,
 	&dev_attr_hello_time.attr,
@@ -364,6 +382,9 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_gc_timer.attr,
 	&dev_attr_group_addr.attr,
 	&dev_attr_flush.attr,
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	&dev_attr_multicast_router.attr,
+#endif
 	NULL
 };
 
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 820643a..696596c 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -159,6 +159,21 @@ static ssize_t store_hairpin_mode(struct net_bridge_port *p, unsigned long v)
 static BRPORT_ATTR(hairpin_mode, S_IRUGO | S_IWUSR,
 		   show_hairpin_mode, store_hairpin_mode);
 
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+static ssize_t show_multicast_router(struct net_bridge_port *p, char *buf)
+{
+	return sprintf(buf, "%d\n", p->multicast_router);
+}
+
+static ssize_t store_multicast_router(struct net_bridge_port *p,
+				      unsigned long v)
+{
+	return br_multicast_set_port_router(p, v);
+}
+static BRPORT_ATTR(multicast_router, S_IRUGO | S_IWUSR, show_multicast_router,
+		   store_multicast_router);
+#endif
+
 static struct brport_attribute *brport_attrs[] = {
 	&brport_attr_path_cost,
 	&brport_attr_priority,
@@ -176,6 +191,9 @@ static struct brport_attribute *brport_attrs[] = {
 	&brport_attr_hold_timer,
 	&brport_attr_flush,
 	&brport_attr_hairpin_mode,
+#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
+	&brport_attr_multicast_router,
+#endif
 	NULL
 };
 

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (9 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries Herbert Xu
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast_snooping sysfs toggle

This patch allows the user to disable IGMP snooping completely
through a sysfs toggle.  It also allows the user to reenable
snooping when it has been automatically disabled due to hash
collisions.  If the collisions have not been resolved however
the system will refuse to reenable snooping.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_multicast.c |   61 ++++++++++++++++++++++++++++++++++++++++++----
 net/bridge/br_private.h   |    1 
 net/bridge/br_sysfs_br.c  |   18 +++++++++++++
 3 files changed, 75 insertions(+), 5 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 674224b..c7a1095 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -656,6 +656,15 @@ void br_multicast_del_port(struct net_bridge_port *port)
 	del_timer_sync(&port->multicast_router_timer);
 }
 
+static void __br_multicast_enable_port(struct net_bridge_port *port)
+{
+	port->multicast_startup_queries_sent = 0;
+
+	if (try_to_del_timer_sync(&port->multicast_query_timer) >= 0 ||
+	    del_timer(&port->multicast_query_timer))
+		mod_timer(&port->multicast_query_timer, jiffies);
+}
+
 void br_multicast_enable_port(struct net_bridge_port *port)
 {
 	struct net_bridge *br = port->br;
@@ -664,11 +673,7 @@ void br_multicast_enable_port(struct net_bridge_port *port)
 	if (br->multicast_disabled || !netif_running(br->dev))
 		goto out;
 
-	port->multicast_startup_queries_sent = 0;
-
-	if (try_to_del_timer_sync(&port->multicast_query_timer) >= 0 ||
-	    del_timer(&port->multicast_query_timer))
-		mod_timer(&port->multicast_query_timer, jiffies);
+	__br_multicast_enable_port(port);
 
 out:
 	spin_unlock(&br->multicast_lock);
@@ -1210,3 +1215,49 @@ unlock:
 
 	return err;
 }
+
+int br_multicast_toggle(struct net_bridge *br, unsigned long val)
+{
+	struct net_bridge_port *port;
+	int err = -ENOENT;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev))
+		goto unlock;
+
+	err = 0;
+	if (br->multicast_disabled == !val)
+		goto unlock;
+
+	br->multicast_disabled = !val;
+	if (br->multicast_disabled)
+		goto unlock;
+
+	if (br->mdb) {
+		if (br->mdb->old) {
+			err = -EEXIST;
+rollback:
+			br->multicast_disabled = !!val;
+			goto unlock;
+		}
+
+		err = br_mdb_rehash(&br->mdb, br->mdb->max,
+				    br->hash_elasticity);
+		if (err)
+			goto rollback;
+	}
+
+	br_multicast_open(br);
+	list_for_each_entry(port, &br->port_list, list) {
+		if (port->state == BR_STATE_DISABLED ||
+		    port->state == BR_STATE_BLOCKING)
+			continue;
+
+		__br_multicast_enable_port(port);
+	}
+
+unlock:
+	spin_unlock(&br->multicast_lock);
+
+	return err;
+}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2d3df82..4467904 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -301,6 +301,7 @@ extern void br_multicast_forward(struct net_bridge_mdb_entry *mdst,
 extern int br_multicast_set_router(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_port_router(struct net_bridge_port *p,
 					unsigned long val);
+extern int br_multicast_toggle(struct net_bridge *br, unsigned long val);
 #else
 static inline int br_multicast_rcv(struct net_bridge *br,
 				   struct net_bridge_port *port,
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index cb74201..0ab2883 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -361,6 +361,23 @@ static ssize_t store_multicast_router(struct device *d,
 }
 static DEVICE_ATTR(multicast_router, S_IRUGO | S_IWUSR, show_multicast_router,
 		   store_multicast_router);
+
+static ssize_t show_multicast_snooping(struct device *d,
+				       struct device_attribute *attr,
+				       char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%d\n", !br->multicast_disabled);
+}
+
+static ssize_t store_multicast_snooping(struct device *d,
+					struct device_attribute *attr,
+					const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, br_multicast_toggle);
+}
+static DEVICE_ATTR(multicast_snooping, S_IRUGO | S_IWUSR,
+		   show_multicast_snooping, store_multicast_snooping);
 #endif
 
 static struct attribute *bridge_attrs[] = {
@@ -384,6 +401,7 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_flush.attr,
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	&dev_attr_multicast_router.attr,
+	&dev_attr_multicast_snooping.attr,
 #endif
 	NULL
 };

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (10 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  5:41   ` [PATCH 13/13] bridge: Add multicast count/interval " Herbert Xu
  2010-02-28  8:52   ` [1/13] bridge: Add IGMP snooping support David Miller
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add hash elasticity/max sysfs entries

This patch allows the user to control the hash elasticity/max
parameters.  The elasticity setting does not take effect until
the next new multicast group is added.  At which point it is
checked and if after rehashing it still can't be satisfied then
snooping will be disabled.

The max setting on the other hand takes effect immediately.  It
must be a power of two and cannot be set to a value less than the
current number of multicast group entries.  This is the only way
to shrink the multicast hash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_multicast.c |   41 +++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h   |    1 +
 net/bridge/br_sysfs_br.c  |   39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index c7a1095..2559fb5 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -15,6 +15,7 @@
 #include <linux/igmp.h>
 #include <linux/jhash.h>
 #include <linux/kernel.h>
+#include <linux/log2.h>
 #include <linux/netdevice.h>
 #include <linux/netfilter_bridge.h>
 #include <linux/random.h>
@@ -1261,3 +1262,43 @@ unlock:
 
 	return err;
 }
+
+int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val)
+{
+	int err = -ENOENT;
+	u32 old;
+
+	spin_lock(&br->multicast_lock);
+	if (!netif_running(br->dev))
+		goto unlock;
+
+	err = -EINVAL;
+	if (!is_power_of_2(val))
+		goto unlock;
+	if (br->mdb && val < br->mdb->size)
+		goto unlock;
+
+	err = 0;
+
+	old = br->hash_max;
+	br->hash_max = val;
+
+	if (br->mdb) {
+		if (br->mdb->old) {
+			err = -EEXIST;
+rollback:
+			br->hash_max = old;
+			goto unlock;
+		}
+
+		err = br_mdb_rehash(&br->mdb, br->hash_max,
+				    br->hash_elasticity);
+		if (err)
+			goto rollback;
+	}
+
+unlock:
+	spin_unlock(&br->multicast_lock);
+
+	return err;
+}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 4467904..0f12a8f 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -302,6 +302,7 @@ extern int br_multicast_set_router(struct net_bridge *br, unsigned long val);
 extern int br_multicast_set_port_router(struct net_bridge_port *p,
 					unsigned long val);
 extern int br_multicast_toggle(struct net_bridge *br, unsigned long val);
+extern int br_multicast_set_hash_max(struct net_bridge *br, unsigned long val);
 #else
 static inline int br_multicast_rcv(struct net_bridge *br,
 				   struct net_bridge_port *port,
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 0ab2883..d2ee53b 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -378,6 +378,43 @@ static ssize_t store_multicast_snooping(struct device *d,
 }
 static DEVICE_ATTR(multicast_snooping, S_IRUGO | S_IWUSR,
 		   show_multicast_snooping, store_multicast_snooping);
+
+static ssize_t show_hash_elasticity(struct device *d,
+				    struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%u\n", br->hash_elasticity);
+}
+
+static int set_elasticity(struct net_bridge *br, unsigned long val)
+{
+	br->hash_elasticity = val;
+	return 0;
+}
+
+static ssize_t store_hash_elasticity(struct device *d,
+				     struct device_attribute *attr,
+				     const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_elasticity);
+}
+static DEVICE_ATTR(hash_elasticity, S_IRUGO | S_IWUSR, show_hash_elasticity,
+		   store_hash_elasticity);
+
+static ssize_t show_hash_max(struct device *d, struct device_attribute *attr,
+			     char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%u\n", br->hash_max);
+}
+
+static ssize_t store_hash_max(struct device *d, struct device_attribute *attr,
+			      const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, br_multicast_set_hash_max);
+}
+static DEVICE_ATTR(hash_max, S_IRUGO | S_IWUSR, show_hash_max,
+		   store_hash_max);
 #endif
 
 static struct attribute *bridge_attrs[] = {
@@ -402,6 +439,8 @@ static struct attribute *bridge_attrs[] = {
 #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
 	&dev_attr_multicast_router.attr,
 	&dev_attr_multicast_snooping.attr,
+	&dev_attr_hash_elasticity.attr,
+	&dev_attr_hash_max.attr,
 #endif
 	NULL
 };

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* [PATCH 13/13] bridge: Add multicast count/interval sysfs entries
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (11 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries Herbert Xu
@ 2010-02-28  5:41   ` Herbert Xu
  2010-02-28  8:52   ` [1/13] bridge: Add IGMP snooping support David Miller
  13 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-02-28  5:41 UTC (permalink / raw)
  To: David S. Miller, netdev, Stephen Hemminger

bridge: Add multicast count/interval sysfs entries

This patch allows the user to the IGMP parameters related to the
snooping function of the bridge.  This includes various time
values and retransmission limits.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/bridge/br_sysfs_br.c |  203 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 203 insertions(+)

diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index d2ee53b..dd321e3 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -415,6 +415,201 @@ static ssize_t store_hash_max(struct device *d, struct device_attribute *attr,
 }
 static DEVICE_ATTR(hash_max, S_IRUGO | S_IWUSR, show_hash_max,
 		   store_hash_max);
+
+static ssize_t show_multicast_last_member_count(struct device *d,
+						struct device_attribute *attr,
+						char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%u\n", br->multicast_last_member_count);
+}
+
+static int set_last_member_count(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_last_member_count = val;
+	return 0;
+}
+
+static ssize_t store_multicast_last_member_count(struct device *d,
+						 struct device_attribute *attr,
+						 const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_last_member_count);
+}
+static DEVICE_ATTR(multicast_last_member_count, S_IRUGO | S_IWUSR,
+		   show_multicast_last_member_count,
+		   store_multicast_last_member_count);
+
+static ssize_t show_multicast_startup_query_count(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%u\n", br->multicast_startup_query_count);
+}
+
+static int set_startup_query_count(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_startup_query_count = val;
+	return 0;
+}
+
+static ssize_t store_multicast_startup_query_count(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_startup_query_count);
+}
+static DEVICE_ATTR(multicast_startup_query_count, S_IRUGO | S_IWUSR,
+		   show_multicast_startup_query_count,
+		   store_multicast_startup_query_count);
+
+static ssize_t show_multicast_last_member_interval(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%lu\n",
+		       jiffies_to_clock_t(br->multicast_last_member_interval));
+}
+
+static int set_last_member_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_last_member_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_last_member_interval(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_last_member_interval);
+}
+static DEVICE_ATTR(multicast_last_member_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_last_member_interval,
+		   store_multicast_last_member_interval);
+
+static ssize_t show_multicast_membership_interval(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%lu\n",
+		       jiffies_to_clock_t(br->multicast_membership_interval));
+}
+
+static int set_membership_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_membership_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_membership_interval(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_membership_interval);
+}
+static DEVICE_ATTR(multicast_membership_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_membership_interval,
+		   store_multicast_membership_interval);
+
+static ssize_t show_multicast_querier_interval(struct device *d,
+					       struct device_attribute *attr,
+					       char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%lu\n",
+		       jiffies_to_clock_t(br->multicast_querier_interval));
+}
+
+static int set_querier_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_querier_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_querier_interval(struct device *d,
+						struct device_attribute *attr,
+						const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_querier_interval);
+}
+static DEVICE_ATTR(multicast_querier_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_querier_interval,
+		   store_multicast_querier_interval);
+
+static ssize_t show_multicast_query_interval(struct device *d,
+					     struct device_attribute *attr,
+					     char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(buf, "%lu\n",
+		       jiffies_to_clock_t(br->multicast_query_interval));
+}
+
+static int set_query_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_query_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_query_interval(struct device *d,
+					      struct device_attribute *attr,
+					      const char *buf, size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_query_interval);
+}
+static DEVICE_ATTR(multicast_query_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_query_interval,
+		   store_multicast_query_interval);
+
+static ssize_t show_multicast_query_response_interval(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(
+		buf, "%lu\n",
+		jiffies_to_clock_t(br->multicast_query_response_interval));
+}
+
+static int set_query_response_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_query_response_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_query_response_interval(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_query_response_interval);
+}
+static DEVICE_ATTR(multicast_query_response_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_query_response_interval,
+		   store_multicast_query_response_interval);
+
+static ssize_t show_multicast_startup_query_interval(
+	struct device *d, struct device_attribute *attr, char *buf)
+{
+	struct net_bridge *br = to_bridge(d);
+	return sprintf(
+		buf, "%lu\n",
+		jiffies_to_clock_t(br->multicast_startup_query_interval));
+}
+
+static int set_startup_query_interval(struct net_bridge *br, unsigned long val)
+{
+	br->multicast_startup_query_interval = clock_t_to_jiffies(val);
+	return 0;
+}
+
+static ssize_t store_multicast_startup_query_interval(
+	struct device *d, struct device_attribute *attr, const char *buf,
+	size_t len)
+{
+	return store_bridge_parm(d, buf, len, set_startup_query_interval);
+}
+static DEVICE_ATTR(multicast_startup_query_interval, S_IRUGO | S_IWUSR,
+		   show_multicast_startup_query_interval,
+		   store_multicast_startup_query_interval);
 #endif
 
 static struct attribute *bridge_attrs[] = {
@@ -441,6 +636,14 @@ static struct attribute *bridge_attrs[] = {
 	&dev_attr_multicast_snooping.attr,
 	&dev_attr_hash_elasticity.attr,
 	&dev_attr_hash_max.attr,
+	&dev_attr_multicast_last_member_count.attr,
+	&dev_attr_multicast_startup_query_count.attr,
+	&dev_attr_multicast_last_member_interval.attr,
+	&dev_attr_multicast_membership_interval.attr,
+	&dev_attr_multicast_querier_interval.attr,
+	&dev_attr_multicast_query_interval.attr,
+	&dev_attr_multicast_query_response_interval.attr,
+	&dev_attr_multicast_startup_query_interval.attr,
 #endif
 	NULL
 };

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [1/13] bridge: Add IGMP snooping support
  2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
                     ` (12 preceding siblings ...)
  2010-02-28  5:41   ` [PATCH 13/13] bridge: Add multicast count/interval " Herbert Xu
@ 2010-02-28  8:52   ` David Miller
  2010-03-01  2:08     ` Herbert Xu
  13 siblings, 1 reply; 81+ messages in thread
From: David Miller @ 2010-02-28  8:52 UTC (permalink / raw)
  To: herbert; +Cc: netdev, shemminger

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 28 Feb 2010 13:40:12 +0800

> This is a repost of exactly the same series in order to get them
> back into patchworks.  I hope I have resolved your concerns about
> patch number 2.  Let me know if you still have any further questions.

Looks good applied.  I had to add the following patch to fix some
Kconfig issues.

Thanks.

bridge: Make IGMP snooping depend upon BRIDGE.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/bridge/Kconfig |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/bridge/Kconfig b/net/bridge/Kconfig
index 78dd549..19a6b96 100644
--- a/net/bridge/Kconfig
+++ b/net/bridge/Kconfig
@@ -34,6 +34,7 @@ config BRIDGE
 
 config BRIDGE_IGMP_SNOOPING
 	bool "IGMP snooping"
+	depends on BRIDGE
 	default y
 	---help---
 	  If you say Y here, then the Ethernet bridge will be able selectively
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [1/13] bridge: Add IGMP snooping support
  2010-02-28  8:52   ` [1/13] bridge: Add IGMP snooping support David Miller
@ 2010-03-01  2:08     ` Herbert Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-03-01  2:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, shemminger

On Sun, Feb 28, 2010 at 12:52:58AM -0800, David Miller wrote:
>
> bridge: Make IGMP snooping depend upon BRIDGE.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Thanks a lot!

I'll get onto the bridge-utils stuff now.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-02-28  5:41   ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
@ 2010-03-05 23:43     ` Paul E. McKenney
  2010-03-06  1:17       ` Herbert Xu
  0 siblings, 1 reply; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-05 23:43 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev, Stephen Hemminger

On Sun, Feb 28, 2010 at 01:41:45PM +0800, Herbert Xu wrote:
> bridge: Add core IGMP snooping support
> 
> This patch adds the core functionality of IGMP snooping support
> without actually hooking it up.  So this patch should be a no-op
> as far as the bridge's external behaviour is concerned.
> 
> All the new code and data is controlled by the Kconfig option
> BRIDGE_IGMP_SNOOPING.  A run-time toggle is also available.
> 
> The multicast switching is done using an hash table that is
> lockless on the read-side through RCU.  On the write-side the
> new multicast_lock is used for all operations.  The hash table
> supports dynamic growth/rehashing.

Cool!!!  You use a pair of list_head structures, so that a given
element can be in both the old and the new hash table simultaneously.
Of course, an RCU grace period must elapse between consecutive resizings.
Which appears to be addressed.

The teardown needs an rcu_barrier_bh() rather than the current
synchronize_rcu_bh(), please see below.

Also, I don't see how the teardown code is preventing new readers from
finding the data structures before they are being passed to call_rcu_bh().
You can't safely start the RCU grace period until -after- all new readers
have been excluded.  (But I could easily be missing something here.)

The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
scheme correctly.

It looks like all updates are protected by an appropriate lock, although
I don't claim to fully understand how the data structures are linked
together.  The key requirement is of course that you normally don't
get to protect insertion and deletion of a given data structure using a
lock located within that same data structure.  It looks to me that this
requirement is satisfied -- all the lists that you manipulate seem to
hang off of the net_bridge, which contains the lock protecting all of
those lists.

Hmmm...  Where is the read-side code?  Wherever it is, it cannot safely
dereference the ->old pointer.

> The hash table will be rehashed if any chain length exceeds a
> preset limit.  If rehashing does not reduce the maximum chain
> length then snooping will be disabled.
> 
> These features may be added in future (in no particular order):
> 
> * IGMPv3 source support
> * Non-querier router detection
> * IPv6

But the bugs all look fixable to me (assuming that they are in fact
bugs).  Very cool to see an RCU-protected resizeable hash table!!!  ;-)

						Thanx, Paul

> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> ---
> 
>  net/bridge/Kconfig        |   12 
>  net/bridge/Makefile       |    2 
>  net/bridge/br_multicast.c | 1135 ++++++++++++++++++++++++++++++++++++++++++++++
>  net/bridge/br_private.h   |  139 +++++
>  4 files changed, 1288 insertions(+)
> 
> diff --git a/net/bridge/Kconfig b/net/bridge/Kconfig
> index e143ca6..78dd549 100644
> --- a/net/bridge/Kconfig
> +++ b/net/bridge/Kconfig
> @@ -31,3 +31,15 @@ config BRIDGE
>  	  will be called bridge.
> 
>  	  If unsure, say N.
> +
> +config BRIDGE_IGMP_SNOOPING
> +	bool "IGMP snooping"
> +	default y
> +	---help---
> +	  If you say Y here, then the Ethernet bridge will be able selectively
> +	  forward multicast traffic based on IGMP traffic received from each
> +	  port.
> +
> +	  Say N to exclude this support and reduce the binary size.
> +
> +	  If unsure, say Y.
> diff --git a/net/bridge/Makefile b/net/bridge/Makefile
> index f444c12..d0359ea 100644
> --- a/net/bridge/Makefile
> +++ b/net/bridge/Makefile
> @@ -12,4 +12,6 @@ bridge-$(CONFIG_SYSFS) += br_sysfs_if.o br_sysfs_br.o
> 
>  bridge-$(CONFIG_BRIDGE_NETFILTER) += br_netfilter.o
> 
> +bridge-$(CONFIG_BRIDGE_IGMP_SNOOPING) += br_multicast.o
> +
>  obj-$(CONFIG_BRIDGE_NF_EBTABLES) += netfilter/
> diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> new file mode 100644
> index 0000000..746b5a6
> --- /dev/null
> +++ b/net/bridge/br_multicast.c
> @@ -0,0 +1,1135 @@
> +/*
> + * Bridge multicast support.
> + *
> + * Copyright (c) 2010 Herbert Xu <herbert@gondor.apana.org.au>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License as published by the Free
> + * Software Foundation; either version 2 of the License, or (at your option)
> + * any later version.
> + *
> + */
> +
> +#include <linux/err.h>
> +#include <linux/if_ether.h>
> +#include <linux/igmp.h>
> +#include <linux/jhash.h>
> +#include <linux/kernel.h>
> +#include <linux/netdevice.h>
> +#include <linux/netfilter_bridge.h>
> +#include <linux/random.h>
> +#include <linux/rculist.h>
> +#include <linux/skbuff.h>
> +#include <linux/slab.h>
> +#include <linux/timer.h>
> +#include <net/ip.h>
> +
> +#include "br_private.h"
> +
> +static inline int br_ip_hash(struct net_bridge_mdb_htable *mdb, __be32 ip)
> +{
> +	return jhash_1word(mdb->secret, (u32)ip) & (mdb->max - 1);
> +}
> +
> +static struct net_bridge_mdb_entry *__br_mdb_ip_get(
> +	struct net_bridge_mdb_htable *mdb, __be32 dst, int hash)
> +{
> +	struct net_bridge_mdb_entry *mp;
> +	struct hlist_node *p;
> +
> +	hlist_for_each_entry(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
> +		if (dst == mp->addr)
> +			return mp;
> +	}
> +
> +	return NULL;
> +}
> +
> +static struct net_bridge_mdb_entry *br_mdb_ip_get(
> +	struct net_bridge_mdb_htable *mdb, __be32 dst)
> +{
> +	return __br_mdb_ip_get(mdb, dst, br_ip_hash(mdb, dst));
> +}
> +
> +struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
> +					struct sk_buff *skb)
> +{
> +	struct net_bridge_mdb_htable *mdb = br->mdb;
> +
> +	if (!mdb || br->multicast_disabled)
> +		return NULL;
> +
> +	switch (skb->protocol) {
> +	case htons(ETH_P_IP):
> +		if (BR_INPUT_SKB_CB(skb)->igmp)
> +			break;
> +		return br_mdb_ip_get(mdb, ip_hdr(skb)->daddr);
> +	}
> +
> +	return NULL;
> +}
> +
> +static void br_mdb_free(struct rcu_head *head)
> +{
> +	struct net_bridge_mdb_htable *mdb =
> +		container_of(head, struct net_bridge_mdb_htable, rcu);
> +	struct net_bridge_mdb_htable *old = mdb->old;
> +
> +	mdb->old = NULL;

So one way to figure out when it is safe to do another resize is when
the ->old pointer is NULLed out here.

> +	kfree(old->mhash);
> +	kfree(old);
> +}
> +
> +static int br_mdb_copy(struct net_bridge_mdb_htable *new,
> +		       struct net_bridge_mdb_htable *old,
> +		       int elasticity)
> +{
> +	struct net_bridge_mdb_entry *mp;
> +	struct hlist_node *p;
> +	int maxlen;
> +	int len;
> +	int i;
> +
> +	for (i = 0; i < old->max; i++)
> +		hlist_for_each_entry(mp, p, &old->mhash[i], hlist[old->ver])
> +			hlist_add_head(&mp->hlist[new->ver],
> +				       &new->mhash[br_ip_hash(new, mp->addr)]);
> +
> +	if (!elasticity)
> +		return 0;
> +
> +	maxlen = 0;
> +	for (i = 0; i < new->max; i++) {
> +		len = 0;
> +		hlist_for_each_entry(mp, p, &new->mhash[i], hlist[new->ver])
> +			len++;
> +		if (len > maxlen)
> +			maxlen = len;
> +	}
> +
> +	return maxlen > elasticity ? -EINVAL : 0;
> +}
> +
> +static void br_multicast_free_pg(struct rcu_head *head)
> +{
> +	struct net_bridge_port_group *p =
> +		container_of(head, struct net_bridge_port_group, rcu);
> +
> +	kfree(p);
> +}
> +
> +static void br_multicast_free_group(struct rcu_head *head)
> +{
> +	struct net_bridge_mdb_entry *mp =
> +		container_of(head, struct net_bridge_mdb_entry, rcu);
> +
> +	kfree(mp);
> +}
> +
> +static void br_multicast_group_expired(unsigned long data)
> +{
> +	struct net_bridge_mdb_entry *mp = (void *)data;
> +	struct net_bridge *br = mp->br;
> +	struct net_bridge_mdb_htable *mdb;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (!netif_running(br->dev) || timer_pending(&mp->timer))
> +		goto out;
> +
> +	if (!hlist_unhashed(&mp->mglist))
> +		hlist_del_init(&mp->mglist);
> +
> +	if (mp->ports)
> +		goto out;
> +
> +	mdb = br->mdb;
> +	hlist_del_rcu(&mp->hlist[mdb->ver]);

Also protected by br->multicast_lock.

> +	mdb->size--;
> +
> +	del_timer(&mp->query_timer);
> +	call_rcu_bh(&mp->rcu, br_multicast_free_group);
> +
> +out:
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +static void br_multicast_del_pg(struct net_bridge *br,
> +				struct net_bridge_port_group *pg)
> +{
> +	struct net_bridge_mdb_htable *mdb = br->mdb;
> +	struct net_bridge_mdb_entry *mp;
> +	struct net_bridge_port_group *p;
> +	struct net_bridge_port_group **pp;
> +
> +	mp = br_mdb_ip_get(mdb, pg->addr);
> +	if (WARN_ON(!mp))
> +		return;
> +
> +	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
> +		if (p != pg)
> +			continue;
> +
> +		*pp = p->next;

If I understand this code correctly, we are relying on the RCU-bh grace
period not completing until we process the next element.  But we don't
appear to be in an RCU-bh read-side critical section.

I believe that you need an rcu_read_lock_bh() and an rcu_read_unlock_bh()
around this loop or in all callers.  @@@

> +		hlist_del_init(&p->mglist);
> +		del_timer(&p->timer);
> +		del_timer(&p->query_timer);
> +		call_rcu_bh(&p->rcu, br_multicast_free_pg);
> +
> +		if (!mp->ports && hlist_unhashed(&mp->mglist) &&
> +		    netif_running(br->dev))
> +			mod_timer(&mp->timer, jiffies);
> +
> +		return;
> +	}
> +
> +	WARN_ON(1);
> +}
> +
> +static void br_multicast_port_group_expired(unsigned long data)
> +{
> +	struct net_bridge_port_group *pg = (void *)data;
> +	struct net_bridge *br = pg->port->br;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (!netif_running(br->dev) || timer_pending(&pg->timer) ||
> +	    hlist_unhashed(&pg->mglist))
> +		goto out;
> +
> +	br_multicast_del_pg(br, pg);
> +
> +out:
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +static int br_mdb_rehash(struct net_bridge_mdb_htable **mdbp, int max,
> +			 int elasticity)
> +{
> +	struct net_bridge_mdb_htable *old = *mdbp;
> +	struct net_bridge_mdb_htable *mdb;
> +	int err;
> +
> +	mdb = kmalloc(sizeof(*mdb), GFP_ATOMIC);
> +	if (!mdb)
> +		return -ENOMEM;
> +
> +	mdb->max = max;
> +	mdb->old = old;

OK, so the current hash table points to the old one.

The way this is set up, it looks to be illegal for RCU readers to
traverse the ->old pointer, since it is NULLed after but one RCU
grace period.

> +	mdb->mhash = kzalloc(max * sizeof(*mdb->mhash), GFP_ATOMIC);
> +	if (!mdb->mhash) {
> +		kfree(mdb);
> +		return -ENOMEM;
> +	}
> +
> +	mdb->size = old ? old->size : 0;
> +	mdb->ver = old ? old->ver ^ 1 : 0;
> +
> +	if (!old || elasticity)
> +		get_random_bytes(&mdb->secret, sizeof(mdb->secret));
> +	else
> +		mdb->secret = old->secret;
> +
> +	if (!old)
> +		goto out;
> +
> +	err = br_mdb_copy(mdb, old, elasticity);
> +	if (err) {
> +		kfree(mdb->mhash);
> +		kfree(mdb);
> +		return err;
> +	}
> +
> +	call_rcu_bh(&mdb->rcu, br_mdb_free);

And we are using RCU-bh.  OK.

> +
> +out:
> +	rcu_assign_pointer(*mdbp, mdb);

Also protected by br->multicast_lock.

> +
> +	return 0;
> +}
> +
> +static struct sk_buff *br_multicast_alloc_query(struct net_bridge *br,
> +						__be32 group)
> +{
> +	struct sk_buff *skb;
> +	struct igmphdr *ih;
> +	struct ethhdr *eth;
> +	struct iphdr *iph;
> +
> +	skb = netdev_alloc_skb_ip_align(br->dev, sizeof(*eth) + sizeof(*iph) +
> +						 sizeof(*ih) + 4);
> +	if (!skb)
> +		goto out;
> +
> +	skb->protocol = htons(ETH_P_IP);
> +
> +	skb_reset_mac_header(skb);
> +	eth = eth_hdr(skb);
> +
> +	memcpy(eth->h_source, br->dev->dev_addr, 6);
> +	eth->h_dest[0] = 1;
> +	eth->h_dest[1] = 0;
> +	eth->h_dest[2] = 0x5e;
> +	eth->h_dest[3] = 0;
> +	eth->h_dest[4] = 0;
> +	eth->h_dest[5] = 1;
> +	eth->h_proto = htons(ETH_P_IP);
> +	skb_put(skb, sizeof(*eth));
> +
> +	skb_set_network_header(skb, skb->len);
> +	iph = ip_hdr(skb);
> +
> +	iph->version = 4;
> +	iph->ihl = 6;
> +	iph->tos = 0xc0;
> +	iph->tot_len = htons(sizeof(*iph) + sizeof(*ih) + 4);
> +	iph->id = 0;
> +	iph->frag_off = htons(IP_DF);
> +	iph->ttl = 1;
> +	iph->protocol = IPPROTO_IGMP;
> +	iph->saddr = 0;
> +	iph->daddr = htonl(INADDR_ALLHOSTS_GROUP);
> +	((u8 *)&iph[1])[0] = IPOPT_RA;
> +	((u8 *)&iph[1])[1] = 4;
> +	((u8 *)&iph[1])[2] = 0;
> +	((u8 *)&iph[1])[3] = 0;
> +	ip_send_check(iph);
> +	skb_put(skb, 24);
> +
> +	skb_set_transport_header(skb, skb->len);
> +	ih = igmp_hdr(skb);
> +	ih->type = IGMP_HOST_MEMBERSHIP_QUERY;
> +	ih->code = (group ? br->multicast_last_member_interval :
> +			    br->multicast_query_response_interval) /
> +		   (HZ / IGMP_TIMER_SCALE);
> +	ih->group = group;
> +	ih->csum = 0;
> +	ih->csum = ip_compute_csum((void *)ih, sizeof(struct igmphdr));
> +	skb_put(skb, sizeof(*ih));
> +
> +	__skb_pull(skb, sizeof(*eth));
> +
> +out:
> +	return skb;
> +}
> +
> +static void br_multicast_send_group_query(struct net_bridge_mdb_entry *mp)
> +{
> +	struct net_bridge *br = mp->br;
> +	struct sk_buff *skb;
> +
> +	skb = br_multicast_alloc_query(br, mp->addr);
> +	if (!skb)
> +		goto timer;
> +
> +	netif_rx(skb);
> +
> +timer:
> +	if (++mp->queries_sent < br->multicast_last_member_count)
> +		mod_timer(&mp->query_timer,
> +			  jiffies + br->multicast_last_member_interval);
> +}
> +
> +static void br_multicast_group_query_expired(unsigned long data)
> +{
> +	struct net_bridge_mdb_entry *mp = (void *)data;
> +	struct net_bridge *br = mp->br;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (!netif_running(br->dev) || hlist_unhashed(&mp->mglist) ||
> +	    mp->queries_sent >= br->multicast_last_member_count)
> +		goto out;
> +
> +	br_multicast_send_group_query(mp);
> +
> +out:
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +static void br_multicast_send_port_group_query(struct net_bridge_port_group *pg)
> +{
> +	struct net_bridge_port *port = pg->port;
> +	struct net_bridge *br = port->br;
> +	struct sk_buff *skb;
> +
> +	skb = br_multicast_alloc_query(br, pg->addr);
> +	if (!skb)
> +		goto timer;
> +
> +	br_deliver(port, skb);
> +
> +timer:
> +	if (++pg->queries_sent < br->multicast_last_member_count)
> +		mod_timer(&pg->query_timer,
> +			  jiffies + br->multicast_last_member_interval);
> +}
> +
> +static void br_multicast_port_group_query_expired(unsigned long data)
> +{
> +	struct net_bridge_port_group *pg = (void *)data;
> +	struct net_bridge_port *port = pg->port;
> +	struct net_bridge *br = port->br;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (!netif_running(br->dev) || hlist_unhashed(&pg->mglist) ||
> +	    pg->queries_sent >= br->multicast_last_member_count)
> +		goto out;
> +
> +	br_multicast_send_port_group_query(pg);
> +
> +out:
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +static struct net_bridge_mdb_entry *br_multicast_get_group(
> +	struct net_bridge *br, struct net_bridge_port *port, __be32 group,
> +	int hash)
> +{
> +	struct net_bridge_mdb_htable *mdb = br->mdb;
> +	struct net_bridge_mdb_entry *mp;
> +	struct hlist_node *p;
> +	unsigned count = 0;
> +	unsigned max;
> +	int elasticity;
> +	int err;
> +
> +	hlist_for_each_entry(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
> +		count++;
> +		if (unlikely(group == mp->addr)) {
> +			return mp;
> +		}
> +	}
> +
> +	elasticity = 0;
> +	max = mdb->max;
> +
> +	if (unlikely(count > br->hash_elasticity && count)) {
> +		if (net_ratelimit())
> +			printk(KERN_INFO "%s: Multicast hash table "
> +			       "chain limit reached: %s\n",
> +			       br->dev->name, port ? port->dev->name :
> +						     br->dev->name);
> +
> +		elasticity = br->hash_elasticity;
> +	}
> +
> +	if (mdb->size >= max) {
> +		max *= 2;
> +		if (unlikely(max >= br->hash_max)) {
> +			printk(KERN_WARNING "%s: Multicast hash table maximum "
> +			       "reached, disabling snooping: %s, %d\n",
> +			       br->dev->name, port ? port->dev->name :
> +						     br->dev->name,
> +			       max);
> +			err = -E2BIG;
> +disable:
> +			br->multicast_disabled = 1;
> +			goto err;
> +		}
> +	}
> +
> +	if (max > mdb->max || elasticity) {
> +		if (mdb->old) {

And here is the ->old check, as required.

> +			if (net_ratelimit())
> +				printk(KERN_INFO "%s: Multicast hash table "
> +				       "on fire: %s\n",
> +				       br->dev->name, port ? port->dev->name :
> +							     br->dev->name);
> +			err = -EEXIST;
> +			goto err;
> +		}
> +
> +		err = br_mdb_rehash(&br->mdb, max, elasticity);
> +		if (err) {
> +			printk(KERN_WARNING "%s: Cannot rehash multicast "
> +			       "hash table, disabling snooping: "
> +			       "%s, %d, %d\n",
> +			       br->dev->name, port ? port->dev->name :
> +						     br->dev->name,
> +			       mdb->size, err);
> +			goto disable;
> +		}
> +
> +		err = -EAGAIN;
> +		goto err;
> +	}
> +
> +	return NULL;
> +
> +err:
> +	mp = ERR_PTR(err);
> +	return mp;
> +}
> +
> +static struct net_bridge_mdb_entry *br_multicast_new_group(
> +	struct net_bridge *br, struct net_bridge_port *port, __be32 group)
> +{
> +	struct net_bridge_mdb_htable *mdb = br->mdb;
> +	struct net_bridge_mdb_entry *mp;
> +	int hash;
> +
> +	if (!mdb) {
> +		if (br_mdb_rehash(&br->mdb, BR_HASH_SIZE, 0))
> +			return NULL;
> +		goto rehash;
> +	}
> +
> +	hash = br_ip_hash(mdb, group);
> +	mp = br_multicast_get_group(br, port, group, hash);
> +	switch (PTR_ERR(mp)) {
> +	case 0:
> +		break;
> +
> +	case -EAGAIN:
> +rehash:
> +		mdb = br->mdb;
> +		hash = br_ip_hash(mdb, group);
> +		break;
> +
> +	default:
> +		goto out;
> +	}
> +
> +	mp = kzalloc(sizeof(*mp), GFP_ATOMIC);
> +	if (unlikely(!mp))
> +		goto out;
> +
> +	mp->br = br;
> +	mp->addr = group;
> +	setup_timer(&mp->timer, br_multicast_group_expired,
> +		    (unsigned long)mp);
> +	setup_timer(&mp->query_timer, br_multicast_group_query_expired,
> +		    (unsigned long)mp);
> +
> +	hlist_add_head_rcu(&mp->hlist[mdb->ver], &mdb->mhash[hash]);

Protected by br->multicast_lock, so OK.

> +	mdb->size++;
> +
> +out:
> +	return mp;
> +}
> +
> +static int br_multicast_add_group(struct net_bridge *br,
> +				  struct net_bridge_port *port, __be32 group)
> +{
> +	struct net_bridge_mdb_entry *mp;
> +	struct net_bridge_port_group *p;
> +	struct net_bridge_port_group **pp;
> +	unsigned long now = jiffies;
> +	int err;
> +
> +	if (ipv4_is_local_multicast(group))
> +		return 0;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (!netif_running(br->dev) ||
> +	    (port && port->state == BR_STATE_DISABLED))
> +		goto out;
> +
> +	mp = br_multicast_new_group(br, port, group);
> +	err = PTR_ERR(mp);
> +	if (unlikely(IS_ERR(mp) || !mp))
> +		goto err;
> +
> +	if (!port) {
> +		hlist_add_head(&mp->mglist, &br->mglist);
> +		mod_timer(&mp->timer, now + br->multicast_membership_interval);
> +		goto out;
> +	}
> +
> +	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
> +		if (p->port == port)
> +			goto found;
> +		if ((unsigned long)p->port < (unsigned long)port)
> +			break;
> +	}
> +
> +	p = kzalloc(sizeof(*p), GFP_ATOMIC);
> +	err = -ENOMEM;
> +	if (unlikely(!p))
> +		goto err;
> +
> +	p->addr = group;
> +	p->port = port;
> +	p->next = *pp;
> +	hlist_add_head(&p->mglist, &port->mglist);
> +	setup_timer(&p->timer, br_multicast_port_group_expired,
> +		    (unsigned long)p);
> +	setup_timer(&p->query_timer, br_multicast_port_group_query_expired,
> +		    (unsigned long)p);
> +
> +	rcu_assign_pointer(*pp, p);

Also protected by br->multicast_lock.

> +
> +found:
> +	mod_timer(&p->timer, now + br->multicast_membership_interval);
> +out:
> +	err = 0;
> +
> +err:
> +	spin_unlock(&br->multicast_lock);
> +	return err;
> +}
> +
> +static void br_multicast_router_expired(unsigned long data)
> +{
> +	struct net_bridge_port *port = (void *)data;
> +	struct net_bridge *br = port->br;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (port->multicast_router != 1 ||
> +	    timer_pending(&port->multicast_router_timer) ||
> +	    hlist_unhashed(&port->rlist))
> +		goto out;
> +
> +	hlist_del_init_rcu(&port->rlist);

Also protected by br->multicast_lock.

> +out:
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +static void br_multicast_local_router_expired(unsigned long data)
> +{
> +}
> +
> +static void br_multicast_send_query(struct net_bridge *br,
> +				    struct net_bridge_port *port, u32 sent)
> +{
> +	unsigned long time;
> +	struct sk_buff *skb;
> +
> +	if (!netif_running(br->dev) || br->multicast_disabled ||
> +	    timer_pending(&br->multicast_querier_timer))
> +		return;
> +
> +	skb = br_multicast_alloc_query(br, 0);
> +	if (!skb)
> +		goto timer;
> +
> +	if (port) {
> +		__skb_push(skb, sizeof(struct ethhdr));
> +		skb->dev = port->dev;
> +		NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
> +			dev_queue_xmit);
> +	} else
> +		netif_rx(skb);
> +
> +timer:
> +	time = jiffies;
> +	time += sent < br->multicast_startup_query_count ?
> +		br->multicast_startup_query_interval :
> +		br->multicast_query_interval;
> +	mod_timer(port ? &port->multicast_query_timer :
> +			 &br->multicast_query_timer, time);
> +}
> +
> +static void br_multicast_port_query_expired(unsigned long data)
> +{
> +	struct net_bridge_port *port = (void *)data;
> +	struct net_bridge *br = port->br;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (port && (port->state == BR_STATE_DISABLED ||
> +		     port->state == BR_STATE_BLOCKING))
> +		goto out;
> +
> +	if (port->multicast_startup_queries_sent <
> +	    br->multicast_startup_query_count)
> +		port->multicast_startup_queries_sent++;
> +
> +	br_multicast_send_query(port->br, port,
> +				port->multicast_startup_queries_sent);
> +
> +out:
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +void br_multicast_add_port(struct net_bridge_port *port)
> +{
> +	port->multicast_router = 1;
> +
> +	setup_timer(&port->multicast_router_timer, br_multicast_router_expired,
> +		    (unsigned long)port);
> +	setup_timer(&port->multicast_query_timer,
> +		    br_multicast_port_query_expired, (unsigned long)port);
> +}
> +
> +void br_multicast_del_port(struct net_bridge_port *port)
> +{
> +	del_timer_sync(&port->multicast_router_timer);
> +}
> +
> +void br_multicast_enable_port(struct net_bridge_port *port)
> +{
> +	struct net_bridge *br = port->br;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (br->multicast_disabled || !netif_running(br->dev))
> +		goto out;
> +
> +	port->multicast_startup_queries_sent = 0;
> +
> +	if (try_to_del_timer_sync(&port->multicast_query_timer) >= 0 ||
> +	    del_timer(&port->multicast_query_timer))
> +		mod_timer(&port->multicast_query_timer, jiffies);
> +
> +out:
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +void br_multicast_disable_port(struct net_bridge_port *port)
> +{
> +	struct net_bridge *br = port->br;
> +	struct net_bridge_port_group *pg;
> +	struct hlist_node *p, *n;
> +
> +	spin_lock(&br->multicast_lock);
> +	hlist_for_each_entry_safe(pg, p, n, &port->mglist, mglist)
> +		br_multicast_del_pg(br, pg);
> +
> +	if (!hlist_unhashed(&port->rlist))
> +		hlist_del_init_rcu(&port->rlist);

Also protected by br->multicast_lock.

> +	del_timer(&port->multicast_router_timer);
> +	del_timer(&port->multicast_query_timer);
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +static int br_multicast_igmp3_report(struct net_bridge *br,
> +				     struct net_bridge_port *port,
> +				     struct sk_buff *skb)
> +{
> +	struct igmpv3_report *ih;
> +	struct igmpv3_grec *grec;
> +	int i;
> +	int len;
> +	int num;
> +	int type;
> +	int err = 0;
> +	__be32 group;
> +
> +	if (!pskb_may_pull(skb, sizeof(*ih)))
> +		return -EINVAL;
> +
> +	ih = igmpv3_report_hdr(skb);
> +	num = ntohs(ih->ngrec);
> +	len = sizeof(*ih);
> +
> +	for (i = 0; i < num; i++) {
> +		len += sizeof(*grec);
> +		if (!pskb_may_pull(skb, len))
> +			return -EINVAL;
> +
> +		grec = (void *)(skb->data + len);
> +		group = grec->grec_mca;
> +		type = grec->grec_type;
> +
> +		len += grec->grec_nsrcs * 4;
> +		if (!pskb_may_pull(skb, len))
> +			return -EINVAL;
> +
> +		/* We treat this as an IGMPv2 report for now. */
> +		switch (type) {
> +		case IGMPV3_MODE_IS_INCLUDE:
> +		case IGMPV3_MODE_IS_EXCLUDE:
> +		case IGMPV3_CHANGE_TO_INCLUDE:
> +		case IGMPV3_CHANGE_TO_EXCLUDE:
> +		case IGMPV3_ALLOW_NEW_SOURCES:
> +		case IGMPV3_BLOCK_OLD_SOURCES:
> +			break;
> +
> +		default:
> +			continue;
> +		}
> +
> +		err = br_multicast_add_group(br, port, group);
> +		if (err)
> +			break;
> +	}
> +
> +	return err;
> +}
> +
> +static void br_multicast_mark_router(struct net_bridge *br,
> +				     struct net_bridge_port *port)
> +{
> +	unsigned long now = jiffies;
> +	struct hlist_node *p;
> +	struct hlist_node **h;
> +
> +	if (!port) {
> +		if (br->multicast_router == 1)
> +			mod_timer(&br->multicast_router_timer,
> +				  now + br->multicast_querier_interval);
> +		return;
> +	}
> +
> +	if (port->multicast_router != 1)
> +		return;
> +
> +	if (!hlist_unhashed(&port->rlist))
> +		goto timer;
> +
> +	for (h = &br->router_list.first;
> +	     (p = *h) &&
> +	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
> +	     (unsigned long)port;
> +	     h = &p->next)
> +		;
> +
> +	port->rlist.pprev = h;
> +	port->rlist.next = p;
> +	rcu_assign_pointer(*h, &port->rlist);

Also protected by br->multicast_lock.

> +	if (p)
> +		p->pprev = &port->rlist.next;
> +
> +timer:
> +	mod_timer(&port->multicast_router_timer,
> +		  now + br->multicast_querier_interval);
> +}
> +
> +static void br_multicast_query_received(struct net_bridge *br,
> +					struct net_bridge_port *port,
> +					__be32 saddr)
> +{
> +	if (saddr)
> +		mod_timer(&br->multicast_querier_timer,
> +			  jiffies + br->multicast_querier_interval);
> +	else if (timer_pending(&br->multicast_querier_timer))
> +		return;
> +
> +	br_multicast_mark_router(br, port);
> +}
> +
> +static int br_multicast_query(struct net_bridge *br,
> +			      struct net_bridge_port *port,
> +			      struct sk_buff *skb)
> +{
> +	struct iphdr *iph = ip_hdr(skb);
> +	struct igmphdr *ih = igmp_hdr(skb);
> +	struct net_bridge_mdb_entry *mp;
> +	struct igmpv3_query *ih3;
> +	struct net_bridge_port_group *p;
> +	struct net_bridge_port_group **pp;
> +	unsigned long max_delay;
> +	unsigned long now = jiffies;
> +	__be32 group;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (!netif_running(br->dev) ||
> +	    (port && port->state == BR_STATE_DISABLED))
> +		goto out;
> +
> +	br_multicast_query_received(br, port, iph->saddr);
> +
> +	group = ih->group;
> +
> +	if (skb->len == sizeof(*ih)) {
> +		max_delay = ih->code * (HZ / IGMP_TIMER_SCALE);
> +
> +		if (!max_delay) {
> +			max_delay = 10 * HZ;
> +			group = 0;
> +		}
> +	} else {
> +		if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)))
> +			return -EINVAL;
> +
> +		ih3 = igmpv3_query_hdr(skb);
> +		if (ih3->nsrcs)
> +			return 0;
> +
> +		max_delay = ih3->code ? 1 :
> +			    IGMPV3_MRC(ih3->code) * (HZ / IGMP_TIMER_SCALE);
> +	}
> +
> +	if (!group)
> +		goto out;
> +
> +	mp = br_mdb_ip_get(br->mdb, group);
> +	if (!mp)
> +		goto out;
> +
> +	max_delay *= br->multicast_last_member_count;
> +
> +	if (!hlist_unhashed(&mp->mglist) &&
> +	    (timer_pending(&mp->timer) ?
> +	     time_after(mp->timer.expires, now + max_delay) :
> +	     try_to_del_timer_sync(&mp->timer) >= 0))
> +		mod_timer(&mp->timer, now + max_delay);
> +
> +	for (pp = &mp->ports; (p = *pp); pp = &p->next) {
> +		if (timer_pending(&p->timer) ?
> +		    time_after(p->timer.expires, now + max_delay) :
> +		    try_to_del_timer_sync(&p->timer) >= 0)
> +			mod_timer(&mp->timer, now + max_delay);
> +	}
> +
> +out:
> +	spin_unlock(&br->multicast_lock);
> +	return 0;
> +}
> +
> +static void br_multicast_leave_group(struct net_bridge *br,
> +				     struct net_bridge_port *port,
> +				     __be32 group)
> +{
> +	struct net_bridge_mdb_htable *mdb;
> +	struct net_bridge_mdb_entry *mp;
> +	struct net_bridge_port_group *p;
> +	unsigned long now;
> +	unsigned long time;
> +
> +	if (ipv4_is_local_multicast(group))
> +		return;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (!netif_running(br->dev) ||
> +	    (port && port->state == BR_STATE_DISABLED) ||
> +	    timer_pending(&br->multicast_querier_timer))
> +		goto out;
> +
> +	mdb = br->mdb;
> +	mp = br_mdb_ip_get(mdb, group);
> +	if (!mp)
> +		goto out;
> +
> +	now = jiffies;
> +	time = now + br->multicast_last_member_count *
> +		     br->multicast_last_member_interval;
> +
> +	if (!port) {
> +		if (!hlist_unhashed(&mp->mglist) &&
> +		    (timer_pending(&mp->timer) ?
> +		     time_after(mp->timer.expires, time) :
> +		     try_to_del_timer_sync(&mp->timer) >= 0)) {
> +			mod_timer(&mp->timer, time);
> +
> +			mp->queries_sent = 0;
> +			mod_timer(&mp->query_timer, now);
> +		}
> +
> +		goto out;
> +	}
> +
> +	for (p = mp->ports; p; p = p->next) {
> +		if (p->port != port)
> +			continue;
> +
> +		if (!hlist_unhashed(&p->mglist) &&
> +		    (timer_pending(&p->timer) ?
> +		     time_after(p->timer.expires, time) :
> +		     try_to_del_timer_sync(&p->timer) >= 0)) {
> +			mod_timer(&p->timer, time);
> +
> +			p->queries_sent = 0;
> +			mod_timer(&p->query_timer, now);
> +		}
> +
> +		break;
> +	}
> +
> +out:
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +static int br_multicast_ipv4_rcv(struct net_bridge *br,
> +				 struct net_bridge_port *port,
> +				 struct sk_buff *skb)
> +{
> +	struct sk_buff *skb2 = skb;
> +	struct iphdr *iph;
> +	struct igmphdr *ih;
> +	unsigned len;
> +	unsigned offset;
> +	int err;
> +
> +	BR_INPUT_SKB_CB(skb)->igmp = 0;
> +	BR_INPUT_SKB_CB(skb)->mrouters_only = 0;
> +
> +	/* We treat OOM as packet loss for now. */
> +	if (!pskb_may_pull(skb, sizeof(*iph)))
> +		return -EINVAL;
> +
> +	iph = ip_hdr(skb);
> +
> +	if (iph->ihl < 5 || iph->version != 4)
> +		return -EINVAL;
> +
> +	if (!pskb_may_pull(skb, ip_hdrlen(skb)))
> +		return -EINVAL;
> +
> +	iph = ip_hdr(skb);
> +
> +	if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
> +		return -EINVAL;
> +
> +	if (iph->protocol != IPPROTO_IGMP)
> +		return 0;
> +
> +	len = ntohs(iph->tot_len);
> +	if (skb->len < len || len < ip_hdrlen(skb))
> +		return -EINVAL;
> +
> +	if (skb->len > len) {
> +		skb2 = skb_clone(skb, GFP_ATOMIC);
> +		if (!skb2)
> +			return -ENOMEM;
> +
> +		err = pskb_trim_rcsum(skb2, len);
> +		if (err)
> +			return err;
> +	}
> +
> +	len -= ip_hdrlen(skb2);
> +	offset = skb_network_offset(skb2) + ip_hdrlen(skb2);
> +	__skb_pull(skb2, offset);
> +	skb_reset_transport_header(skb2);
> +
> +	err = -EINVAL;
> +	if (!pskb_may_pull(skb2, sizeof(*ih)))
> +		goto out;
> +
> +	iph = ip_hdr(skb2);
> +
> +	switch (skb2->ip_summed) {
> +	case CHECKSUM_COMPLETE:
> +		if (!csum_fold(skb2->csum))
> +			break;
> +		/* fall through */
> +	case CHECKSUM_NONE:
> +		skb2->csum = 0;
> +		if (skb_checksum_complete(skb2))
> +			return -EINVAL;
> +	}
> +
> +	err = 0;
> +
> +	BR_INPUT_SKB_CB(skb)->igmp = 1;
> +	ih = igmp_hdr(skb2);
> +
> +	switch (ih->type) {
> +	case IGMP_HOST_MEMBERSHIP_REPORT:
> +	case IGMPV2_HOST_MEMBERSHIP_REPORT:
> +		BR_INPUT_SKB_CB(skb2)->mrouters_only = 1;
> +		err = br_multicast_add_group(br, port, ih->group);
> +		break;
> +	case IGMPV3_HOST_MEMBERSHIP_REPORT:
> +		err = br_multicast_igmp3_report(br, port, skb2);
> +		break;
> +	case IGMP_HOST_MEMBERSHIP_QUERY:
> +		err = br_multicast_query(br, port, skb2);
> +		break;
> +	case IGMP_HOST_LEAVE_MESSAGE:
> +		br_multicast_leave_group(br, port, ih->group);
> +		break;
> +	}
> +
> +out:
> +	__skb_push(skb2, offset);
> +	if (skb2 != skb)
> +		kfree_skb(skb2);
> +	return err;
> +}
> +
> +int br_multicast_rcv(struct net_bridge *br, struct net_bridge_port *port,
> +		     struct sk_buff *skb)
> +{
> +	if (br->multicast_disabled)
> +		return 0;
> +
> +	switch (skb->protocol) {
> +	case htons(ETH_P_IP):
> +		return br_multicast_ipv4_rcv(br, port, skb);
> +	}
> +
> +	return 0;
> +}
> +
> +static void br_multicast_query_expired(unsigned long data)
> +{
> +	struct net_bridge *br = (void *)data;
> +
> +	spin_lock(&br->multicast_lock);
> +	if (br->multicast_startup_queries_sent <
> +	    br->multicast_startup_query_count)
> +		br->multicast_startup_queries_sent++;
> +
> +	br_multicast_send_query(br, NULL, br->multicast_startup_queries_sent);
> +
> +	spin_unlock(&br->multicast_lock);
> +}
> +
> +void br_multicast_init(struct net_bridge *br)
> +{
> +	br->hash_elasticity = 4;
> +	br->hash_max = 512;
> +
> +	br->multicast_router = 1;
> +	br->multicast_last_member_count = 2;
> +	br->multicast_startup_query_count = 2;
> +
> +	br->multicast_last_member_interval = HZ;
> +	br->multicast_query_response_interval = 10 * HZ;
> +	br->multicast_startup_query_interval = 125 * HZ / 4;
> +	br->multicast_query_interval = 125 * HZ;
> +	br->multicast_querier_interval = 255 * HZ;
> +	br->multicast_membership_interval = 260 * HZ;
> +
> +	spin_lock_init(&br->multicast_lock);
> +	setup_timer(&br->multicast_router_timer,
> +		    br_multicast_local_router_expired, 0);
> +	setup_timer(&br->multicast_querier_timer,
> +		    br_multicast_local_router_expired, 0);
> +	setup_timer(&br->multicast_query_timer, br_multicast_query_expired,
> +		    (unsigned long)br);
> +}
> +
> +void br_multicast_open(struct net_bridge *br)
> +{
> +	br->multicast_startup_queries_sent = 0;
> +
> +	if (br->multicast_disabled)
> +		return;
> +
> +	mod_timer(&br->multicast_query_timer, jiffies);
> +}
> +
> +void br_multicast_stop(struct net_bridge *br)
> +{
> +	struct net_bridge_mdb_htable *mdb;
> +	struct net_bridge_mdb_entry *mp;
> +	struct hlist_node *p, *n;
> +	u32 ver;
> +	int i;
> +
> +	del_timer_sync(&br->multicast_router_timer);
> +	del_timer_sync(&br->multicast_querier_timer);
> +	del_timer_sync(&br->multicast_query_timer);
> +
> +	spin_lock_bh(&br->multicast_lock);
> +	mdb = br->mdb;
> +	if (!mdb)
> +		goto out;
> +
> +	br->mdb = NULL;
> +
> +	ver = mdb->ver;
> +	for (i = 0; i < mdb->max; i++) {
> +		hlist_for_each_entry_safe(mp, p, n, &mdb->mhash[i],
> +					  hlist[ver]) {
> +			del_timer(&mp->timer);
> +			del_timer(&mp->query_timer);
> +			call_rcu_bh(&mp->rcu, br_multicast_free_group);

I don't see what prevents new readers from finding the net_bridge_mdb_entry
referenced by mp.  Unless you stop new readers, an RCU grace period
cannot help you!  @@@

> +		}
> +	}
> +
> +	if (mdb->old) {

And another ->old check for teardown purposes?

> +		spin_unlock_bh(&br->multicast_lock);
> +		synchronize_rcu_bh();

Ah, but the above only guarantees that the grace period has elapsed,
not that all of the callbacks have been invoked, so...

> +		spin_lock_bh(&br->multicast_lock);
> +		WARN_ON(mdb->old);

This WARN_ON() can fire!!!

Suggested fix: s/synchronize_rcu_bh()/rcu_barrier_bh()/ @@@

> +	}
> +
> +	mdb->old = mdb;

I can't say I understand how making mdb->old point back at mdb helps...

Ah, clever -- this arrangement causes the RCU callback to free up the
whole mess.  ;-)

> +	call_rcu_bh(&mdb->rcu, br_mdb_free);

I don't see prevents new readers from finding the net_bridge_mdb_htable
referenced by mdb.  Again, if new readers are permitted to reference
this structure, an RCU grace period won't help you -- you cannot start
the RCU grace period until after all new readers have been excluded.

> +
> +out:
> +	spin_unlock_bh(&br->multicast_lock);
> +}
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 7b0aed5..0871775 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -57,6 +57,41 @@ struct net_bridge_fdb_entry
>  	unsigned char			is_static;
>  };
> 
> +struct net_bridge_port_group {
> +	struct net_bridge_port		*port;
> +	struct net_bridge_port_group	*next;
> +	struct hlist_node		mglist;
> +	struct rcu_head			rcu;
> +	struct timer_list		timer;
> +	struct timer_list		query_timer;
> +	__be32				addr;
> +	u32				queries_sent;
> +};
> +
> +struct net_bridge_mdb_entry
> +{
> +	struct hlist_node		hlist[2];
> +	struct hlist_node		mglist;
> +	struct net_bridge		*br;
> +	struct net_bridge_port_group	*ports;
> +	struct rcu_head			rcu;
> +	struct timer_list		timer;
> +	struct timer_list		query_timer;
> +	__be32				addr;
> +	u32				queries_sent;
> +};
> +
> +struct net_bridge_mdb_htable
> +{
> +	struct hlist_head		*mhash;
> +	struct rcu_head			rcu;
> +	struct net_bridge_mdb_htable	*old;
> +	u32				size;
> +	u32				max;
> +	u32				secret;
> +	u32				ver;
> +};
> +
>  struct net_bridge_port
>  {
>  	struct net_bridge		*br;
> @@ -84,6 +119,15 @@ struct net_bridge_port
> 
>  	unsigned long 			flags;
>  #define BR_HAIRPIN_MODE		0x00000001
> +
> +#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> +	u32				multicast_startup_queries_sent;
> +	unsigned char			multicast_router;
> +	struct timer_list		multicast_router_timer;
> +	struct timer_list		multicast_query_timer;
> +	struct hlist_head		mglist;
> +	struct hlist_node		rlist;
> +#endif
>  };
> 
>  struct net_bridge
> @@ -125,6 +169,35 @@ struct net_bridge
>  	unsigned char			topology_change;
>  	unsigned char			topology_change_detected;
> 
> +#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> +	unsigned char			multicast_router;
> +
> +	u8				multicast_disabled:1;
> +
> +	u32				hash_elasticity;
> +	u32				hash_max;
> +
> +	u32				multicast_last_member_count;
> +	u32				multicast_startup_queries_sent;
> +	u32				multicast_startup_query_count;
> +
> +	unsigned long			multicast_last_member_interval;
> +	unsigned long			multicast_membership_interval;
> +	unsigned long			multicast_querier_interval;
> +	unsigned long			multicast_query_interval;
> +	unsigned long			multicast_query_response_interval;
> +	unsigned long			multicast_startup_query_interval;
> +
> +	spinlock_t			multicast_lock;
> +	struct net_bridge_mdb_htable	*mdb;
> +	struct hlist_head		router_list;
> +	struct hlist_head		mglist;
> +
> +	struct timer_list		multicast_router_timer;
> +	struct timer_list		multicast_querier_timer;
> +	struct timer_list		multicast_query_timer;
> +#endif
> +
>  	struct timer_list		hello_timer;
>  	struct timer_list		tcn_timer;
>  	struct timer_list		topology_change_timer;
> @@ -134,6 +207,8 @@ struct net_bridge
> 
>  struct br_input_skb_cb {
>  	struct net_device *brdev;
> +	int igmp;
> +	int mrouters_only;
>  };
> 
>  #define BR_INPUT_SKB_CB(__skb)	((struct br_input_skb_cb *)(__skb)->cb)
> @@ -205,6 +280,70 @@ extern struct sk_buff *br_handle_frame(struct net_bridge_port *p,
>  extern int br_dev_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
>  extern int br_ioctl_deviceless_stub(struct net *net, unsigned int cmd, void __user *arg);
> 
> +/* br_multicast.c */
> +#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> +extern int br_multicast_rcv(struct net_bridge *br,
> +			    struct net_bridge_port *port,
> +			    struct sk_buff *skb);
> +extern struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
> +					       struct sk_buff *skb);
> +extern void br_multicast_add_port(struct net_bridge_port *port);
> +extern void br_multicast_del_port(struct net_bridge_port *port);
> +extern void br_multicast_enable_port(struct net_bridge_port *port);
> +extern void br_multicast_disable_port(struct net_bridge_port *port);
> +extern void br_multicast_init(struct net_bridge *br);
> +extern void br_multicast_open(struct net_bridge *br);
> +extern void br_multicast_stop(struct net_bridge *br);
> +#else
> +static inline int br_multicast_rcv(struct net_bridge *br,
> +				   struct net_bridge_port *port,
> +				   struct sk_buff *skb)
> +{
> +	return 0;
> +}
> +
> +static inline struct net_bridge_mdb_entry *br_mdb_get(struct net_bridge *br,
> +						      struct sk_buff *skb)
> +{
> +	return NULL;
> +}
> +
> +static inline void br_multicast_add_port(struct net_bridge_port *port)
> +{
> +}
> +
> +static inline void br_multicast_del_port(struct net_bridge_port *port)
> +{
> +}
> +
> +static inline void br_multicast_enable_port(struct net_bridge_port *port)
> +{
> +}
> +
> +static inline void br_multicast_disable_port(struct net_bridge_port *port)
> +{
> +}
> +
> +static inline void br_multicast_init(struct net_bridge *br)
> +{
> +}
> +
> +static inline void br_multicast_open(struct net_bridge *br)
> +{
> +}
> +
> +static inline void br_multicast_stop(struct net_bridge *br)
> +{
> +}
> +#endif
> +
> +static inline bool br_multicast_is_router(struct net_bridge *br)
> +{
> +	return br->multicast_router == 2 ||
> +	       (br->multicast_router == 1 &&
> +		timer_pending(&br->multicast_router_timer));
> +}
> +
>  /* br_netfilter.c */
>  #ifdef CONFIG_BRIDGE_NETFILTER
>  extern int br_netfilter_init(void);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-05 23:43     ` Paul E. McKenney
@ 2010-03-06  1:17       ` Herbert Xu
  2010-03-06  5:06         ` Paul E. McKenney
  0 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-03-06  1:17 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: David S. Miller, netdev, Stephen Hemminger

On Fri, Mar 05, 2010 at 03:43:27PM -0800, Paul E. McKenney wrote:
> 
> Cool!!!  You use a pair of list_head structures, so that a given
> element can be in both the old and the new hash table simultaneously.
> Of course, an RCU grace period must elapse between consecutive resizings.
> Which appears to be addressed.

Thanks :) I will try to extend this to other existing hash tables
where the number of updates can be limited like it is here.

> The teardown needs an rcu_barrier_bh() rather than the current
> synchronize_rcu_bh(), please see below.

All the call_rcu_bh's are done under multicast_lock.  The first
check taken after taking the multicast_lock is whether we've
started the tear-down.  So where it currently calls synchronize()
it should already be the case that no call_rcu_bh's are still
running.

> Also, I don't see how the teardown code is preventing new readers from
> finding the data structures before they are being passed to call_rcu_bh().
> You can't safely start the RCU grace period until -after- all new readers
> have been excluded.  (But I could easily be missing something here.)

I understand.  However, AFAICS whatever it is that we are destroying
is taken off the reader's visible data structure before call_rcu_bh.
Do you have a particular case in mind where this is not the case?

> The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
> rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
> scheme correctly.

Any function that modifies the data structure is done under the
multicast_lock, including br_multicast_del_pg.
 
> Hmmm...  Where is the read-side code?  Wherever it is, it cannot safely
> dereference the ->old pointer.

Right, the old pointer is merely there to limit rehashings to one
per window.  So it isn't used by the read-side.

The read-side is the data path (non-IGMP multicast packets).  The
sole entry point is br_mdb_get().

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06  1:17       ` Herbert Xu
@ 2010-03-06  5:06         ` Paul E. McKenney
  2010-03-06  6:56           ` Herbert Xu
  0 siblings, 1 reply; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-06  5:06 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev, Stephen Hemminger

On Sat, Mar 06, 2010 at 09:17:18AM +0800, Herbert Xu wrote:
> On Fri, Mar 05, 2010 at 03:43:27PM -0800, Paul E. McKenney wrote:
> > 
> > Cool!!!  You use a pair of list_head structures, so that a given
> > element can be in both the old and the new hash table simultaneously.
> > Of course, an RCU grace period must elapse between consecutive resizings.
> > Which appears to be addressed.
> 
> Thanks :) I will try to extend this to other existing hash tables
> where the number of updates can be limited like it is here.
> 
> > The teardown needs an rcu_barrier_bh() rather than the current
> > synchronize_rcu_bh(), please see below.
> 
> All the call_rcu_bh's are done under multicast_lock.  The first
> check taken after taking the multicast_lock is whether we've
> started the tear-down.  So where it currently calls synchronize()
> it should already be the case that no call_rcu_bh's are still
> running.

Agreed, but the callbacks registered by the call_rcu_bh() might run
at any time, possibly quite some time after the synchronize_rcu_bh()
completes.  For example, the last call_rcu_bh() might register on
one CPU, and the synchronize_rcu_bh() on another CPU.  Then there
is no guarantee that the call_rcu_bh()'s callback will execute before
the synchronize_rcu_bh() returns.

In contrast, rcu_barrier_bh() is guaranteed not to return until all
pending RCU-bh callbacks have executed.

> > Also, I don't see how the teardown code is preventing new readers from
> > finding the data structures before they are being passed to call_rcu_bh().
> > You can't safely start the RCU grace period until -after- all new readers
> > have been excluded.  (But I could easily be missing something here.)
> 
> I understand.  However, AFAICS whatever it is that we are destroying
> is taken off the reader's visible data structure before call_rcu_bh.
> Do you have a particular case in mind where this is not the case?

I might simply have missed the operation that removed reader
visibility, looking again...

Ah, I see it.  The "br->mdb = NULL" in br_multicast_stop() makes
it impossible for the readers to get to any of the data.  Right?

If so, my confusion, you are right, this one is OK.

> > The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
> > rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
> > scheme correctly.
> 
> Any function that modifies the data structure is done under the
> multicast_lock, including br_multicast_del_pg.

But spin_lock() does not take the place of rcu_read_lock_bh().
And so, in theory, the RCU-bh grace period could complete between
the time that br_multicast_del_pg() does its call_rcu_bh() and the
"*pp = p->next;" at the top of the next loop iteration.  If so,
then br_multicast_free_pg()'s kfree() will possibly have clobbered
"p->next".  Low probability, yes, but a long-running interrupt
could do the trick.

Or is there something I am missing that is preventing an RCU-bh
grace period from completing near the bottom of br_multicast_del_pg()'s
"for" loop?

> > Hmmm...  Where is the read-side code?  Wherever it is, it cannot safely
> > dereference the ->old pointer.
> 
> Right, the old pointer is merely there to limit rehashings to one
> per window.  So it isn't used by the read-side.

Good!

> The read-side is the data path (non-IGMP multicast packets).  The
> sole entry point is br_mdb_get().

Hmmm...  So the caller is responsible for rcu_read_lock_bh()?

Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
in __br_mdb_ip_get(), then?  Or is something else going on here?

							Thanx, Paul

> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06  5:06         ` Paul E. McKenney
@ 2010-03-06  6:56           ` Herbert Xu
  2010-03-06  7:03             ` Herbert Xu
                               ` (3 more replies)
  0 siblings, 4 replies; 81+ messages in thread
From: Herbert Xu @ 2010-03-06  6:56 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: David S. Miller, netdev, Stephen Hemminger

On Fri, Mar 05, 2010 at 09:06:56PM -0800, Paul E. McKenney wrote:
> 
> Agreed, but the callbacks registered by the call_rcu_bh() might run
> at any time, possibly quite some time after the synchronize_rcu_bh()
> completes.  For example, the last call_rcu_bh() might register on
> one CPU, and the synchronize_rcu_bh() on another CPU.  Then there
> is no guarantee that the call_rcu_bh()'s callback will execute before
> the synchronize_rcu_bh() returns.
> 
> In contrast, rcu_barrier_bh() is guaranteed not to return until all
> pending RCU-bh callbacks have executed.

You're absolutely right.  I'll send a patch to fix this.

Incidentally, does rcu_barrier imply rcu_barrier_bh? What about
synchronize_rcu and synchronize_rcu_bh? The reason I'm asking is
that we use a mixture of rcu_read_lock_bh and rcu_read_lock all
over the place but only ever use rcu_barrier and synchronize_rcu.

> > I understand.  However, AFAICS whatever it is that we are destroying
> > is taken off the reader's visible data structure before call_rcu_bh.
> > Do you have a particular case in mind where this is not the case?
> 
> I might simply have missed the operation that removed reader
> visibility, looking again...
> 
> Ah, I see it.  The "br->mdb = NULL" in br_multicast_stop() makes
> it impossible for the readers to get to any of the data.  Right?

Yes.  The read-side will see it and get nothing, while all write-side
paths will see that netif_running is false and exit.

> > > The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
> > > rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
> > > scheme correctly.
> > 
> > Any function that modifies the data structure is done under the
> > multicast_lock, including br_multicast_del_pg.
> 
> But spin_lock() does not take the place of rcu_read_lock_bh().
> And so, in theory, the RCU-bh grace period could complete between
> the time that br_multicast_del_pg() does its call_rcu_bh() and the
> "*pp = p->next;" at the top of the next loop iteration.  If so,
> then br_multicast_free_pg()'s kfree() will possibly have clobbered
> "p->next".  Low probability, yes, but a long-running interrupt
> could do the trick.
> 
> Or is there something I am missing that is preventing an RCU-bh
> grace period from completing near the bottom of br_multicast_del_pg()'s
> "for" loop?

Well all the locks are taken with BH disabled, this should prevent
this problem, no?

> > The read-side is the data path (non-IGMP multicast packets).  The
> > sole entry point is br_mdb_get().
> 
> Hmmm...  So the caller is responsible for rcu_read_lock_bh()?

Yes, all data paths through the bridge operate with BH disabled.

> Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
> in __br_mdb_ip_get(), then?  Or is something else going on here?

Indeed it should, I'll fix this up too.

Thanks for reviewing Paul!
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06  6:56           ` Herbert Xu
@ 2010-03-06  7:03             ` Herbert Xu
  2010-03-07 23:31               ` David Miller
  2010-03-06  7:07             ` Herbert Xu
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-03-06  7:03 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: David S. Miller, netdev, Stephen Hemminger

On Sat, Mar 06, 2010 at 02:56:55PM +0800, Herbert Xu wrote:
>
> > In contrast, rcu_barrier_bh() is guaranteed not to return until all
> > pending RCU-bh callbacks have executed.
> 
> You're absolutely right.  I'll send a patch to fix this.

bridge: Fix RCU race in br_multicast_stop

Thanks to Paul McKenny for pointing out that it is incorrect to use 
synchronize_rcu_bh to ensure that pending callbacks have completed.
Instead we should use rcu_barrier_bh.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2559fb5..48f3b00 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1135,7 +1135,7 @@ void br_multicast_stop(struct net_bridge *br)
 
 	if (mdb->old) {
 		spin_unlock_bh(&br->multicast_lock);
-		synchronize_rcu_bh();
+		rcu_barrier_bh();
 		spin_lock_bh(&br->multicast_lock);
 		WARN_ON(mdb->old);
 	}

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06  6:56           ` Herbert Xu
  2010-03-06  7:03             ` Herbert Xu
@ 2010-03-06  7:07             ` Herbert Xu
  2010-03-07 23:31               ` David Miller
  2010-03-06 15:00             ` Paul E. McKenney
  2010-03-06 15:19             ` Paul E. McKenney
  3 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-03-06  7:07 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: David S. Miller, netdev, Stephen Hemminger

On Sat, Mar 06, 2010 at 02:56:55PM +0800, Herbert Xu wrote:
> On Fri, Mar 05, 2010 at 09:06:56PM -0800, Paul E. McKenney wrote:
>
> > Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
> > in __br_mdb_ip_get(), then?  Or is something else going on here?
> 
> Indeed it should, I'll fix this up too.

bridge: Use RCU list primitive in __br_mdb_ip_get

As Paul McKenney correctly pointed out, __br_mdb_ip_get needs
to use the RCU list walking primitive in order to work correctly
on platforms where data-dependency ordering is not guaranteed.

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index 2559fb5..ba30f41 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -38,7 +38,7 @@ static struct net_bridge_mdb_entry *__br_mdb_ip_get(
 	struct net_bridge_mdb_entry *mp;
 	struct hlist_node *p;
 
-	hlist_for_each_entry(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
+	hlist_for_each_entry_rcu(mp, p, &mdb->mhash[hash], hlist[mdb->ver]) {
 		if (dst == mp->addr)
 			return mp;
 	}

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06  6:56           ` Herbert Xu
  2010-03-06  7:03             ` Herbert Xu
  2010-03-06  7:07             ` Herbert Xu
@ 2010-03-06 15:00             ` Paul E. McKenney
  2010-03-06 15:19             ` Paul E. McKenney
  3 siblings, 0 replies; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-06 15:00 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev, Stephen Hemminger

On Sat, Mar 06, 2010 at 02:56:55PM +0800, Herbert Xu wrote:
> On Fri, Mar 05, 2010 at 09:06:56PM -0800, Paul E. McKenney wrote:
> > 
> > Agreed, but the callbacks registered by the call_rcu_bh() might run
> > at any time, possibly quite some time after the synchronize_rcu_bh()
> > completes.  For example, the last call_rcu_bh() might register on
> > one CPU, and the synchronize_rcu_bh() on another CPU.  Then there
> > is no guarantee that the call_rcu_bh()'s callback will execute before
> > the synchronize_rcu_bh() returns.
> > 
> > In contrast, rcu_barrier_bh() is guaranteed not to return until all
> > pending RCU-bh callbacks have executed.
> 
> You're absolutely right.  I'll send a patch to fix this.
> 
> Incidentally, does rcu_barrier imply rcu_barrier_bh? What about
> synchronize_rcu and synchronize_rcu_bh? The reason I'm asking is
> that we use a mixture of rcu_read_lock_bh and rcu_read_lock all
> over the place but only ever use rcu_barrier and synchronize_rcu.
> 
> > > I understand.  However, AFAICS whatever it is that we are destroying
> > > is taken off the reader's visible data structure before call_rcu_bh.
> > > Do you have a particular case in mind where this is not the case?
> > 
> > I might simply have missed the operation that removed reader
> > visibility, looking again...
> > 
> > Ah, I see it.  The "br->mdb = NULL" in br_multicast_stop() makes
> > it impossible for the readers to get to any of the data.  Right?
> 
> Yes.  The read-side will see it and get nothing, while all write-side
> paths will see that netif_running is false and exit.
> 
> > > > The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
> > > > rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
> > > > scheme correctly.
> > > 
> > > Any function that modifies the data structure is done under the
> > > multicast_lock, including br_multicast_del_pg.
> > 
> > But spin_lock() does not take the place of rcu_read_lock_bh().
> > And so, in theory, the RCU-bh grace period could complete between
> > the time that br_multicast_del_pg() does its call_rcu_bh() and the
> > "*pp = p->next;" at the top of the next loop iteration.  If so,
> > then br_multicast_free_pg()'s kfree() will possibly have clobbered
> > "p->next".  Low probability, yes, but a long-running interrupt
> > could do the trick.
> > 
> > Or is there something I am missing that is preventing an RCU-bh
> > grace period from completing near the bottom of br_multicast_del_pg()'s
> > "for" loop?
> 
> Well all the locks are taken with BH disabled, this should prevent
> this problem, no?

Those locks are indeed taken with BH disabled, you are right!

And I need to fix my RCU lockdep rcu_dereference_bh() checks to
look for disabled BH as well as rcu_read_lock_bh(), for that matter.

						Thanx, Paul

> > > The read-side is the data path (non-IGMP multicast packets).  The
> > > sole entry point is br_mdb_get().
> > 
> > Hmmm...  So the caller is responsible for rcu_read_lock_bh()?
> 
> Yes, all data paths through the bridge operate with BH disabled.
> 
> > Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
> > in __br_mdb_ip_get(), then?  Or is something else going on here?
> 
> Indeed it should, I'll fix this up too.
> 
> Thanks for reviewing Paul!
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06  6:56           ` Herbert Xu
                               ` (2 preceding siblings ...)
  2010-03-06 15:00             ` Paul E. McKenney
@ 2010-03-06 15:19             ` Paul E. McKenney
  2010-03-06 19:00               ` Paul E. McKenney
  3 siblings, 1 reply; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-06 15:19 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev, Stephen Hemminger

On Sat, Mar 06, 2010 at 02:56:55PM +0800, Herbert Xu wrote:
> On Fri, Mar 05, 2010 at 09:06:56PM -0800, Paul E. McKenney wrote:
> > 
> > Agreed, but the callbacks registered by the call_rcu_bh() might run
> > at any time, possibly quite some time after the synchronize_rcu_bh()
> > completes.  For example, the last call_rcu_bh() might register on
> > one CPU, and the synchronize_rcu_bh() on another CPU.  Then there
> > is no guarantee that the call_rcu_bh()'s callback will execute before
> > the synchronize_rcu_bh() returns.
> > 
> > In contrast, rcu_barrier_bh() is guaranteed not to return until all
> > pending RCU-bh callbacks have executed.
> 
> You're absolutely right.  I'll send a patch to fix this.
> 
> Incidentally, does rcu_barrier imply rcu_barrier_bh? What about
> synchronize_rcu and synchronize_rcu_bh? The reason I'm asking is
> that we use a mixture of rcu_read_lock_bh and rcu_read_lock all
> over the place but only ever use rcu_barrier and synchronize_rcu.

Hmmm...  rcu_barrier() definitely does -not- imply rcu_barrier_bh(),
because there are separate sets of callbacks whose execution can
be throttled separately.  So, while you would expect RCU-bh grace
periods to complete more quickly, if there was a large number of
RCU-bh callbacks on a given CPU but very few RCU callbacks, it might
well take longer for the RCU-bh callbacks to be invoked.

With TREE_PREEMPT_RCU, if there were no RCU readers but one long-running
RCU-bh reader, then synchronize_rcu_bh() could return before
synchronize_rcu() does.

The simple approach would be to do something like:

	synchronize_rcu();
	synchronize_rcu_bh();

on the one hand, and:

	rcu_barrier();
	rcu_barrier_bh();

on the other.  However, this is not so good for update-side latency.

Perhaps we need a primitive that waits for both RCU and RCU-bh in
parallel?  This is pretty easy for synchronize_rcu() and
synchronize_rcu_bh(), and probably not too hard for rcu_barrier()
and rcu_barrier_bh().

Hmmm...  Do we have the same issue with call_rcu() and call_rcu_bh()?

						Thanx, Paul

> > > I understand.  However, AFAICS whatever it is that we are destroying
> > > is taken off the reader's visible data structure before call_rcu_bh.
> > > Do you have a particular case in mind where this is not the case?
> > 
> > I might simply have missed the operation that removed reader
> > visibility, looking again...
> > 
> > Ah, I see it.  The "br->mdb = NULL" in br_multicast_stop() makes
> > it impossible for the readers to get to any of the data.  Right?
> 
> Yes.  The read-side will see it and get nothing, while all write-side
> paths will see that netif_running is false and exit.
> 
> > > > The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
> > > > rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
> > > > scheme correctly.
> > > 
> > > Any function that modifies the data structure is done under the
> > > multicast_lock, including br_multicast_del_pg.
> > 
> > But spin_lock() does not take the place of rcu_read_lock_bh().
> > And so, in theory, the RCU-bh grace period could complete between
> > the time that br_multicast_del_pg() does its call_rcu_bh() and the
> > "*pp = p->next;" at the top of the next loop iteration.  If so,
> > then br_multicast_free_pg()'s kfree() will possibly have clobbered
> > "p->next".  Low probability, yes, but a long-running interrupt
> > could do the trick.
> > 
> > Or is there something I am missing that is preventing an RCU-bh
> > grace period from completing near the bottom of br_multicast_del_pg()'s
> > "for" loop?
> 
> Well all the locks are taken with BH disabled, this should prevent
> this problem, no?
> 
> > > The read-side is the data path (non-IGMP multicast packets).  The
> > > sole entry point is br_mdb_get().
> > 
> > Hmmm...  So the caller is responsible for rcu_read_lock_bh()?
> 
> Yes, all data paths through the bridge operate with BH disabled.
> 
> > Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
> > in __br_mdb_ip_get(), then?  Or is something else going on here?
> 
> Indeed it should, I'll fix this up too.
> 
> Thanks for reviewing Paul!
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06 15:19             ` Paul E. McKenney
@ 2010-03-06 19:00               ` Paul E. McKenney
  2010-03-07  2:45                 ` Herbert Xu
  0 siblings, 1 reply; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-06 19:00 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev, Stephen Hemminger

On Sat, Mar 06, 2010 at 07:19:33AM -0800, Paul E. McKenney wrote:
> On Sat, Mar 06, 2010 at 02:56:55PM +0800, Herbert Xu wrote:
> > On Fri, Mar 05, 2010 at 09:06:56PM -0800, Paul E. McKenney wrote:
> > > 
> > > Agreed, but the callbacks registered by the call_rcu_bh() might run
> > > at any time, possibly quite some time after the synchronize_rcu_bh()
> > > completes.  For example, the last call_rcu_bh() might register on
> > > one CPU, and the synchronize_rcu_bh() on another CPU.  Then there
> > > is no guarantee that the call_rcu_bh()'s callback will execute before
> > > the synchronize_rcu_bh() returns.
> > > 
> > > In contrast, rcu_barrier_bh() is guaranteed not to return until all
> > > pending RCU-bh callbacks have executed.
> > 
> > You're absolutely right.  I'll send a patch to fix this.
> > 
> > Incidentally, does rcu_barrier imply rcu_barrier_bh? What about
> > synchronize_rcu and synchronize_rcu_bh? The reason I'm asking is
> > that we use a mixture of rcu_read_lock_bh and rcu_read_lock all
> > over the place but only ever use rcu_barrier and synchronize_rcu.
> 
> Hmmm...  rcu_barrier() definitely does -not- imply rcu_barrier_bh(),
> because there are separate sets of callbacks whose execution can
> be throttled separately.  So, while you would expect RCU-bh grace
> periods to complete more quickly, if there was a large number of
> RCU-bh callbacks on a given CPU but very few RCU callbacks, it might
> well take longer for the RCU-bh callbacks to be invoked.
> 
> With TREE_PREEMPT_RCU, if there were no RCU readers but one long-running
> RCU-bh reader, then synchronize_rcu_bh() could return before
> synchronize_rcu() does.
> 
> The simple approach would be to do something like:
> 
> 	synchronize_rcu();
> 	synchronize_rcu_bh();
> 
> on the one hand, and:
> 
> 	rcu_barrier();
> 	rcu_barrier_bh();
> 
> on the other.  However, this is not so good for update-side latency.
> 
> Perhaps we need a primitive that waits for both RCU and RCU-bh in
> parallel?  This is pretty easy for synchronize_rcu() and
> synchronize_rcu_bh(), and probably not too hard for rcu_barrier()
> and rcu_barrier_bh().
> 
> Hmmm...  Do we have the same issue with call_rcu() and call_rcu_bh()?

But before I get too excited...

You really are talking about code like the following, correct?

	rcu_read_lock();
	p = rcu_dereference(global_p);
	do_something_with(p);
	rcu_read_unlock();

	. . .

	rcu_read_lock_bh();
	p = rcu_dereference(global_p);
	do_something_else_with(p);
	rcu_read_unlock_bh();

	. . . 

	spin_lock(&my_lock);
	p = global_p;
	rcu_assign_pointer(global_p, NULL);
	synchronize_rcu();  /* BUG -- also need synchronize_rcu_bh(). */
	kfree(p);
	spin_unlock(&my_lock);

In other words, different readers traversing the same data structure
under different flavors of RCU protection, but then using only one
flavor of RCU grace period during the update?

						Thanx, Paul

> > > > I understand.  However, AFAICS whatever it is that we are destroying
> > > > is taken off the reader's visible data structure before call_rcu_bh.
> > > > Do you have a particular case in mind where this is not the case?
> > > 
> > > I might simply have missed the operation that removed reader
> > > visibility, looking again...
> > > 
> > > Ah, I see it.  The "br->mdb = NULL" in br_multicast_stop() makes
> > > it impossible for the readers to get to any of the data.  Right?
> > 
> > Yes.  The read-side will see it and get nothing, while all write-side
> > paths will see that netif_running is false and exit.
> > 
> > > > > The br_multicast_del_pg() looks to need rcu_read_lock_bh() and
> > > > > rcu_read_unlock_bh() around its loop, if I understand the pointer-walking
> > > > > scheme correctly.
> > > > 
> > > > Any function that modifies the data structure is done under the
> > > > multicast_lock, including br_multicast_del_pg.
> > > 
> > > But spin_lock() does not take the place of rcu_read_lock_bh().
> > > And so, in theory, the RCU-bh grace period could complete between
> > > the time that br_multicast_del_pg() does its call_rcu_bh() and the
> > > "*pp = p->next;" at the top of the next loop iteration.  If so,
> > > then br_multicast_free_pg()'s kfree() will possibly have clobbered
> > > "p->next".  Low probability, yes, but a long-running interrupt
> > > could do the trick.
> > > 
> > > Or is there something I am missing that is preventing an RCU-bh
> > > grace period from completing near the bottom of br_multicast_del_pg()'s
> > > "for" loop?
> > 
> > Well all the locks are taken with BH disabled, this should prevent
> > this problem, no?
> > 
> > > > The read-side is the data path (non-IGMP multicast packets).  The
> > > > sole entry point is br_mdb_get().
> > > 
> > > Hmmm...  So the caller is responsible for rcu_read_lock_bh()?
> > 
> > Yes, all data paths through the bridge operate with BH disabled.
> > 
> > > Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
> > > in __br_mdb_ip_get(), then?  Or is something else going on here?
> > 
> > Indeed it should, I'll fix this up too.
> > 
> > Thanks for reviewing Paul!
> > -- 
> > Visit Openswan at http://www.openswan.org/
> > Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> > Home Page: http://gondor.apana.org.au/~herbert/
> > PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06 19:00               ` Paul E. McKenney
@ 2010-03-07  2:45                 ` Herbert Xu
  2010-03-07  3:11                   ` Paul E. McKenney
  0 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-03-07  2:45 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: David S. Miller, netdev, Stephen Hemminger

On Sat, Mar 06, 2010 at 11:00:00AM -0800, Paul E. McKenney wrote:
>
> > Hmmm...  rcu_barrier() definitely does -not- imply rcu_barrier_bh(),
> > because there are separate sets of callbacks whose execution can
> > be throttled separately.  So, while you would expect RCU-bh grace
> > periods to complete more quickly, if there was a large number of
> > RCU-bh callbacks on a given CPU but very few RCU callbacks, it might
> > well take longer for the RCU-bh callbacks to be invoked.
> > 
> > With TREE_PREEMPT_RCU, if there were no RCU readers but one long-running
> > RCU-bh reader, then synchronize_rcu_bh() could return before
> > synchronize_rcu() does.

OK, then we definitely do have some issues under net/ with respect
to the two types of RCU usage.  As you can see, we use the RCU-BH
variant on the read-side in various places, and call_rcu_bh on the
write-side too, but we only ever use the non-BH version of the
functions rcu_barrier and synchronize_rcu.

Now there is a possibility that the places where we use synchronize
and rcu_barrier don't really care about the BH variant, but an
audit wouldn't hurt.

> You really are talking about code like the following, correct?
> 
> 	rcu_read_lock();
> 	p = rcu_dereference(global_p);
> 	do_something_with(p);
> 	rcu_read_unlock();
> 
> 	. . .
> 
> 	rcu_read_lock_bh();
> 	p = rcu_dereference(global_p);
> 	do_something_else_with(p);
> 	rcu_read_unlock_bh();
> 
> 	. . . 
> 
> 	spin_lock(&my_lock);
> 	p = global_p;
> 	rcu_assign_pointer(global_p, NULL);
> 	synchronize_rcu();  /* BUG -- also need synchronize_rcu_bh(). */
> 	kfree(p);
> 	spin_unlock(&my_lock);
> 
> In other words, different readers traversing the same data structure
> under different flavors of RCU protection, but then using only one
> flavor of RCU grace period during the update?

We usually don't use synchronize_rcu/rcu_barrier on the update side,
but rather they are used in the tear-down process.

But otherwise yes this is exactly my concern.

Note that we may have a problem on the update side too if we used
the wrong call_rcu variant, but it would require a thorough audit
to reveal those.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-07  2:45                 ` Herbert Xu
@ 2010-03-07  3:11                   ` Paul E. McKenney
  2010-03-08 18:50                     ` Arnd Bergmann
  2010-03-09 21:12                     ` Arnd Bergmann
  0 siblings, 2 replies; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-07  3:11 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev, Stephen Hemminger, arnd

On Sun, Mar 07, 2010 at 10:45:00AM +0800, Herbert Xu wrote:
> On Sat, Mar 06, 2010 at 11:00:00AM -0800, Paul E. McKenney wrote:
> >
> > > Hmmm...  rcu_barrier() definitely does -not- imply rcu_barrier_bh(),
> > > because there are separate sets of callbacks whose execution can
> > > be throttled separately.  So, while you would expect RCU-bh grace
> > > periods to complete more quickly, if there was a large number of
> > > RCU-bh callbacks on a given CPU but very few RCU callbacks, it might
> > > well take longer for the RCU-bh callbacks to be invoked.
> > > 
> > > With TREE_PREEMPT_RCU, if there were no RCU readers but one long-running
> > > RCU-bh reader, then synchronize_rcu_bh() could return before
> > > synchronize_rcu() does.
> 
> OK, then we definitely do have some issues under net/ with respect
> to the two types of RCU usage.  As you can see, we use the RCU-BH
> variant on the read-side in various places, and call_rcu_bh on the
> write-side too, but we only ever use the non-BH version of the
> functions rcu_barrier and synchronize_rcu.
> 
> Now there is a possibility that the places where we use synchronize
> and rcu_barrier don't really care about the BH variant, but an
> audit wouldn't hurt.
> 
> > You really are talking about code like the following, correct?
> > 
> > 	rcu_read_lock();
> > 	p = rcu_dereference(global_p);
> > 	do_something_with(p);
> > 	rcu_read_unlock();
> > 
> > 	. . .
> > 
> > 	rcu_read_lock_bh();
> > 	p = rcu_dereference(global_p);
> > 	do_something_else_with(p);
> > 	rcu_read_unlock_bh();
> > 
> > 	. . . 
> > 
> > 	spin_lock(&my_lock);
> > 	p = global_p;
> > 	rcu_assign_pointer(global_p, NULL);
> > 	synchronize_rcu();  /* BUG -- also need synchronize_rcu_bh(). */
> > 	kfree(p);
> > 	spin_unlock(&my_lock);
> > 
> > In other words, different readers traversing the same data structure
> > under different flavors of RCU protection, but then using only one
> > flavor of RCU grace period during the update?
> 
> We usually don't use synchronize_rcu/rcu_barrier on the update side,
> but rather they are used in the tear-down process.
> 
> But otherwise yes this is exactly my concern.
> 
> Note that we may have a problem on the update side too if we used
> the wrong call_rcu variant, but it would require a thorough audit
> to reveal those.

OK, just re-checked your patch, and it looks OK.

Also adding Arnd to CC.

Arnd, would it be reasonable to extend your RCU-sparse changes to have
four different pointer namespaces, one for each flavor of RCU?  (RCU,
RCU-bh, RCU-sched, and SRCU)?  Always a fan of making the computer do
the auditing where reasonable.  ;-)

This could potentially catch the mismatched call_rcu()s, at least if the
rcu_head could be labeled.

Other thoughts?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06  7:03             ` Herbert Xu
@ 2010-03-07 23:31               ` David Miller
  0 siblings, 0 replies; 81+ messages in thread
From: David Miller @ 2010-03-07 23:31 UTC (permalink / raw)
  To: herbert; +Cc: paulmck, netdev, shemminger

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 6 Mar 2010 15:03:35 +0800

> On Sat, Mar 06, 2010 at 02:56:55PM +0800, Herbert Xu wrote:
>>
>> > In contrast, rcu_barrier_bh() is guaranteed not to return until all
>> > pending RCU-bh callbacks have executed.
>> 
>> You're absolutely right.  I'll send a patch to fix this.
> 
> bridge: Fix RCU race in br_multicast_stop
> 
> Thanks to Paul McKenny for pointing out that it is incorrect to use 
> synchronize_rcu_bh to ensure that pending callbacks have completed.
> Instead we should use rcu_barrier_bh.
> 
> Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-06  7:07             ` Herbert Xu
@ 2010-03-07 23:31               ` David Miller
  0 siblings, 0 replies; 81+ messages in thread
From: David Miller @ 2010-03-07 23:31 UTC (permalink / raw)
  To: herbert; +Cc: paulmck, netdev, shemminger

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 6 Mar 2010 15:07:39 +0800

> On Sat, Mar 06, 2010 at 02:56:55PM +0800, Herbert Xu wrote:
>> On Fri, Mar 05, 2010 at 09:06:56PM -0800, Paul E. McKenney wrote:
>>
>> > Shouldn't the br_mdb_get() code path be using hlist_for_each_entry_rcu()
>> > in __br_mdb_ip_get(), then?  Or is something else going on here?
>> 
>> Indeed it should, I'll fix this up too.
> 
> bridge: Use RCU list primitive in __br_mdb_ip_get
> 
> As Paul McKenney correctly pointed out, __br_mdb_ip_get needs
> to use the RCU list walking primitive in order to work correctly
> on platforms where data-dependency ordering is not guaranteed.
> 
> Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-07  3:11                   ` Paul E. McKenney
@ 2010-03-08 18:50                     ` Arnd Bergmann
  2010-03-09  3:15                       ` Paul E. McKenney
  2010-03-11 18:49                       ` Arnd Bergmann
  2010-03-09 21:12                     ` Arnd Bergmann
  1 sibling, 2 replies; 81+ messages in thread
From: Arnd Bergmann @ 2010-03-08 18:50 UTC (permalink / raw)
  To: paulmck; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Sunday 07 March 2010, Paul E. McKenney wrote:
> On Sun, Mar 07, 2010 at 10:45:00AM +0800, Herbert Xu wrote:
> > On Sat, Mar 06, 2010 at 11:00:00AM -0800, Paul E. McKenney wrote:
> 
> OK, just re-checked your patch, and it looks OK.
> 
> Also adding Arnd to CC.
> 
> Arnd, would it be reasonable to extend your RCU-sparse changes to have
> four different pointer namespaces, one for each flavor of RCU?  (RCU,
> RCU-bh, RCU-sched, and SRCU)?  Always a fan of making the computer do
> the auditing where reasonable.  ;-)

Yes, I guess that would be possible. I'd still leave out the rculist
from any annotations for now, as this would get even more complex then.

One consequence will be the need for new rcu_assign_pointer{,_bh,_sched}
macros that check the address space of the first argument, otherwise
you'd be able to stick anything in there, including non-__rcu pointers.

I've also found a few places (less than a handful) that use RCU to
protect per-CPU data. Not sure how to deal with that, because now
this also has its own named address space (__percpu), and it's probably
a bit too much to introduce all combinations of
{s,}rcu_{assign_pointer,dereference}{,_bh,_sched}{,_const}{,_percpu},
so I'm ignoring them for now.

> This could potentially catch the mismatched call_rcu()s, at least if the
> rcu_head could be labeled.

I haven't labeled the rcu_head at all so far, and I'm not sure if that's
necessary. What I've been thinking about is replacing typical code like

/* this is called with the writer-side lock held */
void foo_assign(struct foo *foo, struct bar *newbar)
{
	struct bar *bar = rcu_dereference_const(foo->bar); /* I just had to add 
							    this dereference */
	rcu_assign_pointer(foo->bar, newbar);
	if (bar)
		call_rcu(&bar->rcu, bar_destructor);
}

with the shorter

void foo_assign(struct foo *foo, struct bar *newbar)
{
	struct bar *bar = rcu_exchange(foo->bar, newbar);
	if (bar)
		call_rcu(&bar->rcu, bar_destructor);
}

Now we could combine this to

void foo_assign(struct foo *foo, struct bar *newbar)
{
	rcu_exchange_call(foo->bar, newbar, rcu, bar_destructor);
}

#define rcu_exchange_call(ptr, new, member, func) \
({ \
	typeof(new) old = rcu_exchange((ptr),(new)); \
	if (old) \
		call_rcu(&(old)->member, (func));	\
	old; \
})

and make appropriate versions of all the above rcu methods for this.
With some extra macro magic, this could even become type safe and
accept a function that takes a typeof(ptr) argument instead of the
rcu_head.

	Arnd

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-08 18:50                     ` Arnd Bergmann
@ 2010-03-09  3:15                       ` Paul E. McKenney
  2010-03-11 18:49                       ` Arnd Bergmann
  1 sibling, 0 replies; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-09  3:15 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Mon, Mar 08, 2010 at 07:50:48PM +0100, Arnd Bergmann wrote:
> On Sunday 07 March 2010, Paul E. McKenney wrote:
> > On Sun, Mar 07, 2010 at 10:45:00AM +0800, Herbert Xu wrote:
> > > On Sat, Mar 06, 2010 at 11:00:00AM -0800, Paul E. McKenney wrote:
> > 
> > OK, just re-checked your patch, and it looks OK.
> > 
> > Also adding Arnd to CC.
> > 
> > Arnd, would it be reasonable to extend your RCU-sparse changes to have
> > four different pointer namespaces, one for each flavor of RCU?  (RCU,
> > RCU-bh, RCU-sched, and SRCU)?  Always a fan of making the computer do
> > the auditing where reasonable.  ;-)
> 
> Yes, I guess that would be possible. I'd still leave out the rculist
> from any annotations for now, as this would get even more complex then.

Understood!

> One consequence will be the need for new rcu_assign_pointer{,_bh,_sched}
> macros that check the address space of the first argument, otherwise
> you'd be able to stick anything in there, including non-__rcu pointers.
> 
> I've also found a few places (less than a handful) that use RCU to
> protect per-CPU data. Not sure how to deal with that, because now
> this also has its own named address space (__percpu), and it's probably
> a bit too much to introduce all combinations of
> {s,}rcu_{assign_pointer,dereference}{,_bh,_sched}{,_const}{,_percpu},
> so I'm ignoring them for now.

Ouch!!!

> > This could potentially catch the mismatched call_rcu()s, at least if the
> > rcu_head could be labeled.
> 
> I haven't labeled the rcu_head at all so far, and I'm not sure if that's
> necessary. What I've been thinking about is replacing typical code like
> 
> /* this is called with the writer-side lock held */
> void foo_assign(struct foo *foo, struct bar *newbar)
> {
> 	struct bar *bar = rcu_dereference_const(foo->bar); /* I just had to add 
> 							    this dereference */
> 	rcu_assign_pointer(foo->bar, newbar);
> 	if (bar)
> 		call_rcu(&bar->rcu, bar_destructor);
> }
> 
> with the shorter
> 
> void foo_assign(struct foo *foo, struct bar *newbar)
> {
> 	struct bar *bar = rcu_exchange(foo->bar, newbar);
> 	if (bar)
> 		call_rcu(&bar->rcu, bar_destructor);
> }
> 
> Now we could combine this to
> 
> void foo_assign(struct foo *foo, struct bar *newbar)
> {
> 	rcu_exchange_call(foo->bar, newbar, rcu, bar_destructor);
> }
> 
> #define rcu_exchange_call(ptr, new, member, func) \
> ({ \
> 	typeof(new) old = rcu_exchange((ptr),(new)); \
> 	if (old) \
> 		call_rcu(&(old)->member, (func));	\
> 	old; \
> })
> 
> and make appropriate versions of all the above rcu methods for this.
> With some extra macro magic, this could even become type safe and
> accept a function that takes a typeof(ptr) argument instead of the
> rcu_head.

This approach does look promising!  And probably a lot simpler than
attempting to label the rcu_head structure.  I am not yet convinced
about the typesafe function taking typeof(ptr), but possibly I am
suffering a failure of C-preprocessor imagination?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/13] bridge: Add multicast_router sysfs entries
  2010-02-27  0:42   ` Stephen Hemminger
  2010-02-27 11:29     ` David Miller
@ 2010-03-09 12:25     ` Herbert Xu
  2010-03-09 12:26       ` Herbert Xu
  1 sibling, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-03-09 12:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev

On Fri, Feb 26, 2010 at 04:42:11PM -0800, Stephen Hemminger wrote:
> 
> I like the functionality, but don't like users whacking on sysfs
> directly. Could you send patches to integrate a user API into
> bridge-utils; the utils are at: 
>   git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/bridge-utils.git

Here it is:

commit 0eff6a003b34eec7f7216a8cd93cb545e25196b1
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 9 20:21:20 2010 +0800

    bridge-utils: Add IGMP snooping support
    
    This patch adds support for setting/reading IGMP snooping parameters.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/brctl/brctl_cmd.c b/brctl/brctl_cmd.c
index d37e99c..50dea20 100644
--- a/brctl/brctl_cmd.c
+++ b/brctl/brctl_cmd.c
@@ -247,6 +247,209 @@ static int br_cmd_setmaxage(int argc, char *const* argv)
 	return err != 0;
 }
 
+static int br_cmd_sethashel(int argc, char *const* argv)
+{
+	int elasticity;
+	int err;
+
+	if (sscanf(argv[2], "%i", &elasticity) != 1) {
+		fprintf(stderr,"bad elasticity\n");
+		return 1;
+	}
+	err = br_set(argv[1], "hash_elasticity", elasticity, 0);
+	if (err)
+		fprintf(stderr, "set hash elasticity failed: %s\n",
+			strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_sethashmax(int argc, char *const* argv)
+{
+	int max;
+	int err;
+
+	if (sscanf(argv[2], "%i", &max) != 1) {
+		fprintf(stderr,"bad max\n");
+		return 1;
+	}
+	err = br_set(argv[1], "hash_max", max, 0);
+	if (err)
+		fprintf(stderr, "set hash max failed: %s\n", strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmclmc(int argc, char *const* argv)
+{
+	int lmc;
+	int err;
+
+	if (sscanf(argv[2], "%i", &lmc) != 1) {
+		fprintf(stderr,"bad count\n");
+		return 1;
+	}
+	err = br_set(argv[1], "multicast_last_member_count", lmc, 0);
+	if (err)
+		fprintf(stderr, "set multicast last member count failed: %s\n",
+			strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmcrouter(int argc, char *const* argv)
+{
+	int router;
+	int err;
+
+	if (sscanf(argv[2], "%i", &router) != 1) {
+		fprintf(stderr,"bad router\n");
+		return 1;
+	}
+	err = br_set(argv[1], "multicast_router", router, 0);
+	if (err)
+		fprintf(stderr, "set multicast router failed: %s\n",
+			strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmcsnoop(int argc, char *const* argv)
+{
+	int snoop;
+	int err;
+
+	if (sscanf(argv[2], "%i", &snoop) != 1) {
+		fprintf(stderr,"bad snooping\n");
+		return 1;
+	}
+	err = br_set(argv[1], "multicast_snooping", snoop, 0);
+	if (err)
+		fprintf(stderr, "set multicast snooping failed: %s\n",
+			strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmcsqc(int argc, char *const* argv)
+{
+	int sqc;
+	int err;
+
+	if (sscanf(argv[2], "%i", &sqc) != 1) {
+		fprintf(stderr,"bad count\n");
+		return 1;
+	}
+	err = br_set(argv[1], "multicast_startup_query_count", sqc, 0);
+	if (err)
+		fprintf(stderr, "set multicast startup query count failed: "
+				"%s\n", strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmclmi(int argc, char *const* argv)
+{
+	struct timeval tv;
+	int err;
+
+	if (strtotimeval(&tv, argv[2])) {
+		fprintf(stderr, "bad interval\n");
+		return 1;
+	}
+	err = br_set_time(argv[1], "multicast_last_member_interval", &tv);
+	if (err)
+		fprintf(stderr, "set multicast last member interval failed: "
+				"%s\n", strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmcmi(int argc, char *const* argv)
+{
+	struct timeval tv;
+	int err;
+
+	if (strtotimeval(&tv, argv[2])) {
+		fprintf(stderr, "bad interval\n");
+		return 1;
+	}
+	err = br_set_time(argv[1], "multicast_membership_interval", &tv);
+	if (err)
+		fprintf(stderr, "set multicast membership interval failed: "
+				"%s\n", strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmcqpi(int argc, char *const* argv)
+{
+	struct timeval tv;
+	int err;
+
+	if (strtotimeval(&tv, argv[2])) {
+		fprintf(stderr, "bad interval\n");
+		return 1;
+	}
+	err = br_set_time(argv[1], "multicast_querier_interval", &tv);
+	if (err)
+		fprintf(stderr, "set multicast querier interval failed: %s\n",
+			strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmcqi(int argc, char *const* argv)
+{
+	struct timeval tv;
+	int err;
+
+	if (strtotimeval(&tv, argv[2])) {
+		fprintf(stderr, "bad interval\n");
+		return 1;
+	}
+	err = br_set_time(argv[1], "multicast_query_interval", &tv);
+	if (err)
+		fprintf(stderr, "set multicast query interval failed: %s\n",
+			strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmcqri(int argc, char *const* argv)
+{
+	struct timeval tv;
+	int err;
+
+	if (strtotimeval(&tv, argv[2])) {
+		fprintf(stderr, "bad interval\n");
+		return 1;
+	}
+	err = br_set_time(argv[1], "multicast_query_response_interval", &tv);
+	if (err)
+		fprintf(stderr, "set multicast query response interval "
+				"failed: %s\n", strerror(err));
+
+	return err != 0;
+}
+
+static int br_cmd_setmcsqi(int argc, char *const* argv)
+{
+	struct timeval tv;
+	int err;
+
+	if (strtotimeval(&tv, argv[2])) {
+		fprintf(stderr, "bad interval\n");
+		return 1;
+	}
+	err = br_set_time(argv[1], "multicast_startup_query_interval", &tv);
+	if (err)
+		fprintf(stderr, "set multicast startup query interval "
+				"failed: %s\n", strerror(err));
+
+	return err != 0;
+}
+
 static int br_cmd_setpathcost(int argc, char *const* argv)
 {
 	int cost, err;
@@ -280,6 +483,24 @@ static int br_cmd_setportprio(int argc, char *const* argv)
 	return err != 0;
 }
 
+static int br_cmd_setportmcrouter(int argc, char *const* argv)
+{
+	int router;
+	int err;
+
+	if (sscanf(argv[3], "%i", &router) != 1) {
+		fprintf(stderr, "bad router\n");
+		return 1;
+	}
+
+	err = br_set_port_mcrouter(argv[1], argv[2], router);
+	if (err)
+		fprintf(stderr, "set port multicast router failed: %s\n",
+			strerror(errno));
+
+	return err != 0;
+}
+
 static int br_cmd_stp(int argc, char *const* argv)
 {
 	int stp, err;
@@ -450,10 +671,36 @@ static const struct command commands[] = {
 	  "<bridge> <time>\t\tset hello time" },
 	{ 2, "setmaxage", br_cmd_setmaxage,
 	  "<bridge> <time>\t\tset max message age" },
+	{ 2, "sethashel", br_cmd_sethashel,
+	  "<bridge> <int>\t\tset hash elasticity" },
+	{ 2, "sethashmax", br_cmd_sethashmax,
+	  "<bridge> <int>\t\tset hash max" },
+	{ 2, "setmclmc", br_cmd_setmclmc,
+	  "<bridge> <int>\t\tset multicast last member count" },
+	{ 2, "setmcrouter", br_cmd_setmcrouter,
+	  "<bridge> <int>\t\tset multicast router" },
+	{ 2, "setmcsnoop", br_cmd_setmcsnoop,
+	  "<bridge> <int>\t\tset multicast snooping" },
+	{ 2, "setmcsqc", br_cmd_setmcsqc,
+	  "<bridge> <int>\t\tset multicast startup query count" },
+	{ 2, "setmclmi", br_cmd_setmclmi,
+	  "<bridge> <time>\t\tset multicast last member interval" },
+	{ 2, "setmcmi", br_cmd_setmcmi,
+	  "<bridge> <time>\t\tset multicast membership interval" },
+	{ 2, "setmcqpi", br_cmd_setmcqpi,
+	  "<bridge> <time>\t\tset multicast querier interval" },
+	{ 2, "setmcqi", br_cmd_setmcqi,
+	  "<bridge> <time>\t\tset multicast query interval" },
+	{ 2, "setmcqri", br_cmd_setmcqri,
+	  "<bridge> <time>\t\tset multicast query response interval" },
+	{ 2, "setmcqri", br_cmd_setmcsqi,
+	  "<bridge> <time>\t\tset multicast startup query interval" },
 	{ 3, "setpathcost", br_cmd_setpathcost, 
 	  "<bridge> <port> <cost>\tset path cost" },
 	{ 3, "setportprio", br_cmd_setportprio,
 	  "<bridge> <port> <prio>\tset port priority" },
+	{ 3, "setportmcrouter", br_cmd_setportmcrouter,
+	  "<bridge> <port> <int>\tset port multicast router" },
 	{ 0, "show", br_cmd_show, "\t\t\tshow a list of bridges" },
 	{ 1, "showmacs", br_cmd_showmacs, 
 	  "<bridge>\t\tshow a list of mac addrs"},
diff --git a/brctl/brctl_disp.c b/brctl/brctl_disp.c
index 3e81241..0cf64b8 100644
--- a/brctl/brctl_disp.c
+++ b/brctl/brctl_disp.c
@@ -88,6 +88,7 @@ static int dump_port_info(const char *br, const char *p,  void *arg)
 	printf("\n designated cost\t%4i", pinfo.designated_cost);
 	printf("\t\t\thold timer\t\t");
 	br_show_timer(&pinfo.hold_timer_value);
+	printf("\n mc router\t\t%4i", pinfo.multicast_router);
 	printf("\n flags\t\t\t");
 	if (pinfo.config_pending)
 		printf("CONFIG_PENDING ");
@@ -133,6 +134,26 @@ void br_dump_info(const char *br, const struct bridge_info *bri)
 	br_show_timer(&bri->topology_change_timer_value);
 	printf("\t\t\tgc timer\t\t");
 	br_show_timer(&bri->gc_timer_value);
+	printf("\n hash elasticity\t%4i", bri->hash_elasticity);
+	printf("\t\t\thash max\t\t%4i", bri->hash_max);
+	printf("\n mc last member count\t%4i",
+	       bri->multicast_last_member_count);
+	printf("\t\t\tmc init query count\t%4i",
+	       bri->multicast_startup_query_count);
+	printf("\n mc router\t\t%4i", bri->multicast_router);
+	printf("\t\t\tmc snooping\t\t%4i", bri->multicast_snooping);
+	printf("\n mc last member timer\t");
+	br_show_timer(&bri->multicast_last_member_interval);
+	printf("\t\t\tmc membership timer\t");
+	br_show_timer(&bri->multicast_membership_interval);
+	printf("\n mc querier timer\t");
+	br_show_timer(&bri->multicast_querier_interval);
+	printf("\t\t\tmc query interval\t");
+	br_show_timer(&bri->multicast_query_interval);
+	printf("\n mc response interval\t");
+	br_show_timer(&bri->multicast_query_response_interval);
+	printf("\t\t\tmc init query interval\t");
+	br_show_timer(&bri->multicast_startup_query_interval);
 	printf("\n flags\t\t\t");
 	if (bri->topology_change)
 		printf("TOPOLOGY_CHANGE ");
diff --git a/libbridge/libbridge.h b/libbridge/libbridge.h
index 39964f2..ae33ffd 100644
--- a/libbridge/libbridge.h
+++ b/libbridge/libbridge.h
@@ -39,6 +39,10 @@ struct bridge_info
 	struct bridge_id designated_root;
 	struct bridge_id bridge_id;
 	unsigned root_path_cost;
+	unsigned hash_elasticity;
+	unsigned hash_max;
+	unsigned multicast_last_member_count;
+	unsigned multicast_startup_query_count;
 	struct timeval max_age;
 	struct timeval hello_time;
 	struct timeval forward_delay;
@@ -49,11 +53,19 @@ struct bridge_info
 	unsigned char stp_enabled;
 	unsigned char topology_change;
 	unsigned char topology_change_detected;
+	unsigned char multicast_router;
+	unsigned char multicast_snooping;
 	struct timeval ageing_time;
 	struct timeval hello_timer_value;
 	struct timeval tcn_timer_value;
 	struct timeval topology_change_timer_value;
 	struct timeval gc_timer_value;
+	struct timeval multicast_last_member_interval;
+	struct timeval multicast_membership_interval;
+	struct timeval multicast_querier_interval;
+	struct timeval multicast_query_interval;
+	struct timeval multicast_query_response_interval;
+	struct timeval multicast_startup_query_interval;
 };
 
 struct fdb_entry
@@ -75,6 +87,7 @@ struct port_info
 	unsigned char top_change_ack;
 	unsigned char config_pending;
 	unsigned char state;
+	unsigned char multicast_router;
 	unsigned path_cost;
 	unsigned designated_cost;
 	struct timeval message_age_timer_value;
@@ -102,6 +115,9 @@ extern int br_add_bridge(const char *brname);
 extern int br_del_bridge(const char *brname);
 extern int br_add_interface(const char *br, const char *dev);
 extern int br_del_interface(const char *br, const char *dev);
+extern int br_set(const char *bridge, const char *name,
+		  unsigned long value, unsigned long oldcode);
+extern int br_set_time(const char *br, const char *name, struct timeval *tv);
 extern int br_set_bridge_forward_delay(const char *br, struct timeval *tv);
 extern int br_set_bridge_hello_time(const char *br, struct timeval *tv);
 extern int br_set_bridge_max_age(const char *br, struct timeval *tv);
@@ -110,6 +126,8 @@ extern int br_set_stp_state(const char *br, int stp_state);
 extern int br_set_bridge_priority(const char *br, int bridge_priority);
 extern int br_set_port_priority(const char *br, const char *p, 
 				int port_priority);
+extern int br_set_port_mcrouter(const char *bridge, const char *port,
+				int value);
 extern int br_set_path_cost(const char *br, const char *p, 
 			    int path_cost);
 extern int br_read_fdb(const char *br, struct fdb_entry *fdbs, 
diff --git a/libbridge/libbridge_devif.c b/libbridge/libbridge_devif.c
index aa8bc36..180c2f9 100644
--- a/libbridge/libbridge_devif.c
+++ b/libbridge/libbridge_devif.c
@@ -189,6 +189,28 @@ int br_get_bridge_info(const char *bridge, struct bridge_info *info)
 	info->topology_change = fetch_int(path, "topology_change");
 	info->topology_change_detected = fetch_int(path, "topology_change_detected");
 
+	info->hash_elasticity = fetch_int(path, "hash_elasticity");
+	info->hash_max = fetch_int(path, "hash_max");
+	info->multicast_last_member_count =
+		fetch_int(path, "multicast_last_member_count");
+	info->multicast_router = fetch_int(path, "multicast_router");
+	info->multicast_snooping = fetch_int(path, "multicast_snooping");
+	info->multicast_startup_query_count =
+		fetch_int(path, "multicast_startup_query_count");
+
+	fetch_tv(path, "multicast_last_member_interval",
+		 &info->multicast_last_member_interval);
+	fetch_tv(path, "multicast_membership_interval",
+		 &info->multicast_membership_interval);
+	fetch_tv(path, "multicast_querier_interval",
+		 &info->multicast_querier_interval);
+	fetch_tv(path, "multicast_query_interval",
+		 &info->multicast_query_interval);
+	fetch_tv(path, "multicast_query_response_interval",
+		 &info->multicast_query_response_interval);
+	fetch_tv(path, "multicast_startup_query_interval",
+		 &info->multicast_startup_query_interval);
+
 	closedir(dir);
 	return 0;
 
@@ -272,6 +294,7 @@ int br_get_port_info(const char *brname, const char *port,
 	fetch_tv(path, "forward_delay_timer", &info->forward_delay_timer_value);
 	fetch_tv(path, "hold_timer", &info->hold_timer_value);
 	info->hairpin_mode = fetch_int(path, "hairpin_mode");
+	info->multicast_router = fetch_int(path, "multicast_router");
 
 	closedir(d);
 
@@ -281,8 +304,8 @@ fallback:
 }
 
 
-static int br_set(const char *bridge, const char *name,
-		  unsigned long value, unsigned long oldcode)
+int br_set(const char *bridge, const char *name,
+	   unsigned long value, unsigned long oldcode)
 {
 	int ret;
 	char path[SYSFS_PATH_MAX];
@@ -294,7 +317,7 @@ static int br_set(const char *bridge, const char *name,
 	if (f) {
 		ret = fprintf(f, "%ld\n", value);
 		fclose(f);
-	} else {
+	} else if (oldcode) {
 		/* fallback to old ioctl */
 		struct ifreq ifr;
 		unsigned long args[4] = { oldcode, value, 0, 0 };
@@ -302,11 +325,17 @@ static int br_set(const char *bridge, const char *name,
 		strncpy(ifr.ifr_name, bridge, IFNAMSIZ);
 		ifr.ifr_data = (char *) &args;
 		ret = ioctl(br_socket_fd, SIOCDEVPRIVATE, &ifr);
-	}
+	} else
+		return ENOSYS;
 
 	return ret < 0 ? errno : 0;
 }
 
+int br_set_time(const char *br, const char *name, struct timeval *tv)
+{
+	return br_set(br, name, __tv_to_jiffies(tv), 0);
+}
+
 int br_set_bridge_forward_delay(const char *br, struct timeval *tv)
 {
 	return br_set(br, "forward_delay", __tv_to_jiffies(tv),
@@ -355,7 +384,7 @@ static int port_set(const char *bridge, const char *ifname,
 	if (f) {
 		ret = fprintf(f, "%ld\n", value);
 		fclose(f);
-	} else {
+	} else if (oldcode) {
 		int index = get_portno(bridge, ifname);
 
 		if (index < 0)
@@ -368,7 +397,8 @@ static int port_set(const char *bridge, const char *ifname,
 			ifr.ifr_data = (char *) &args;
 			ret = ioctl(br_socket_fd, SIOCDEVPRIVATE, &ifr);
 		}
-	}
+	} else
+		return ENOSYS;
 
 	return ret < 0 ? errno : 0;
 }
@@ -378,6 +408,11 @@ int br_set_port_priority(const char *bridge, const char *port, int priority)
 	return port_set(bridge, port, "priority", priority, BRCTL_SET_PORT_PRIORITY);
 }
 
+int br_set_port_mcrouter(const char *bridge, const char *port, int value)
+{
+	return port_set(bridge, port, "multicast_router", value, 0);
+}
+
 int br_set_path_cost(const char *bridge, const char *port, int cost)
 {
 	return port_set(bridge, port, "path_cost", cost, BRCTL_SET_PATH_COST);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 10/13] bridge: Add multicast_router sysfs entries
  2010-03-09 12:25     ` Herbert Xu
@ 2010-03-09 12:26       ` Herbert Xu
  0 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-03-09 12:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev

On Tue, Mar 09, 2010 at 08:25:26PM +0800, Herbert Xu wrote:
> 
> commit 0eff6a003b34eec7f7216a8cd93cb545e25196b1
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Tue Mar 9 20:21:20 2010 +0800
> 
>     bridge-utils: Add IGMP snooping support

And you need this patch on top to make setting values work:

commit de7d2ed12184b629511f8a5dfceb50cf1f73d52d
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 9 20:23:00 2010 +0800

    bridge-utils: Fix sysfs path in br_set
    
    The sysfs path was missing the "bridge" component.
    
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/libbridge/libbridge_devif.c b/libbridge/libbridge_devif.c
index 180c2f9..126027f 100644
--- a/libbridge/libbridge_devif.c
+++ b/libbridge/libbridge_devif.c
@@ -311,7 +311,8 @@ int br_set(const char *bridge, const char *name,
 	char path[SYSFS_PATH_MAX];
 	FILE *f;
 
-	snprintf(path, SYSFS_PATH_MAX, SYSFS_CLASS_NET "%s/%s", bridge, name);
+	snprintf(path, SYSFS_PATH_MAX, SYSFS_CLASS_NET "%s/bridge/%s", bridge,
+		 name);
 
 	f = fopen(path, "w");
 	if (f) {

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-07  3:11                   ` Paul E. McKenney
  2010-03-08 18:50                     ` Arnd Bergmann
@ 2010-03-09 21:12                     ` Arnd Bergmann
  2010-03-10  2:14                       ` Paul E. McKenney
  1 sibling, 1 reply; 81+ messages in thread
From: Arnd Bergmann @ 2010-03-09 21:12 UTC (permalink / raw)
  To: paulmck; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Sunday 07 March 2010, Paul E. McKenney wrote:
> On Sun, Mar 07, 2010 at 10:45:00AM +0800, Herbert Xu wrote:
> > On Sat, Mar 06, 2010 at 11:00:00AM -0800, Paul E. McKenney wrote:
> 
> Arnd, would it be reasonable to extend your RCU-sparse changes to have
> four different pointer namespaces, one for each flavor of RCU?  (RCU,
> RCU-bh, RCU-sched, and SRCU)?  Always a fan of making the computer do
> the auditing where reasonable.  ;-)
> 
> This could potentially catch the mismatched call_rcu()s, at least if the
> rcu_head could be labeled.
> 
> Other thoughts?

I've just tried annotating net/ipv4/route.c like this and did not get
very far, because the same pointers are used for rcu and rcu_bh.
Could you check if this is a false positive or an actual finding?

	Arnd

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-09 21:12                     ` Arnd Bergmann
@ 2010-03-10  2:14                       ` Paul E. McKenney
  2010-03-10  9:41                         ` Arnd Bergmann
  0 siblings, 1 reply; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-10  2:14 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Tue, Mar 09, 2010 at 10:12:59PM +0100, Arnd Bergmann wrote:
> On Sunday 07 March 2010, Paul E. McKenney wrote:
> > On Sun, Mar 07, 2010 at 10:45:00AM +0800, Herbert Xu wrote:
> > > On Sat, Mar 06, 2010 at 11:00:00AM -0800, Paul E. McKenney wrote:
> > 
> > Arnd, would it be reasonable to extend your RCU-sparse changes to have
> > four different pointer namespaces, one for each flavor of RCU?  (RCU,
> > RCU-bh, RCU-sched, and SRCU)?  Always a fan of making the computer do
> > the auditing where reasonable.  ;-)
> > 
> > This could potentially catch the mismatched call_rcu()s, at least if the
> > rcu_head could be labeled.
> > 
> > Other thoughts?
> 
> I've just tried annotating net/ipv4/route.c like this and did not get
> very far, because the same pointers are used for rcu and rcu_bh.
> Could you check if this is a false positive or an actual finding?

Hmmm...  I am only seeing a call_rcu_bh() here, so unless I am missing
something, this is a real problem in TREE_PREEMPT_RCU kernels.  The
call_rcu_bh() only interacts with the rcu_read_lock_bh() readers, not
the rcu_read_lock() readers.

One approach is to run freed blocks through both types of grace periods,
I suppose.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10  2:14                       ` Paul E. McKenney
@ 2010-03-10  9:41                         ` Arnd Bergmann
  2010-03-10 10:39                           ` Eric Dumazet
  2010-03-10 13:19                           ` Paul E. McKenney
  0 siblings, 2 replies; 81+ messages in thread
From: Arnd Bergmann @ 2010-03-10  9:41 UTC (permalink / raw)
  To: paulmck; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Wednesday 10 March 2010 03:14:10 Paul E. McKenney wrote:
> On Tue, Mar 09, 2010 at 10:12:59PM +0100, Arnd Bergmann wrote:
>
> > I've just tried annotating net/ipv4/route.c like this and did not get
> > very far, because the same pointers are used for rcu and rcu_bh.
> > Could you check if this is a false positive or an actual finding?
> 
> Hmmm...  I am only seeing a call_rcu_bh() here, so unless I am missing
> something, this is a real problem in TREE_PREEMPT_RCU kernels.  The
> call_rcu_bh() only interacts with the rcu_read_lock_bh() readers, not
> the rcu_read_lock() readers.
> 
> One approach is to run freed blocks through both types of grace periods,
> I suppose.

Well, if I introduce different __rcu and __rcu_bh address space annotations,
sparse would still not like that, because then you can only pass the annotated
pointers into either rcu_dereference or rcu_dereference_bh.

What the code seems to be doing here is in some places

	local_bh_disable();
	...
	rcu_read_lock();
	rcu_dereference(rt_hash_table[h].chain);
	rcu_read_unlock();
	...
	local_bh_enable();

and in others

	rcu_read_lock_bh();
	rcu_dereference_bh(rt_hash_table[h].chain);
	rcu_read_unlock_bh();

When rt_hash_table[h].chain gets the __rcu_bh annotation, we'd have to
turn first rcu_dereference into rcu_dereference_bh in order to have a clean
build with sparse. Would that change be
a) correct from RCU perspective,
b) desirable for code inspection, and
c) lockdep-clean?

	Arnd

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10  9:41                         ` Arnd Bergmann
@ 2010-03-10 10:39                           ` Eric Dumazet
  2010-03-10 10:49                             ` Herbert Xu
  2010-03-10 13:19                           ` Paul E. McKenney
  1 sibling, 1 reply; 81+ messages in thread
From: Eric Dumazet @ 2010-03-10 10:39 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: paulmck, Herbert Xu, David S. Miller, netdev, Stephen Hemminger

Le mercredi 10 mars 2010 à 10:41 +0100, Arnd Bergmann a écrit :
> On Wednesday 10 March 2010 03:14:10 Paul E. McKenney wrote:
> > On Tue, Mar 09, 2010 at 10:12:59PM +0100, Arnd Bergmann wrote:
> >
> > > I've just tried annotating net/ipv4/route.c like this and did not get
> > > very far, because the same pointers are used for rcu and rcu_bh.
> > > Could you check if this is a false positive or an actual finding?
> > 
> > Hmmm...  I am only seeing a call_rcu_bh() here, so unless I am missing
> > something, this is a real problem in TREE_PREEMPT_RCU kernels.  The
> > call_rcu_bh() only interacts with the rcu_read_lock_bh() readers, not
> > the rcu_read_lock() readers.
> > 
> > One approach is to run freed blocks through both types of grace periods,
> > I suppose.
> 
> Well, if I introduce different __rcu and __rcu_bh address space annotations,
> sparse would still not like that, because then you can only pass the annotated
> pointers into either rcu_dereference or rcu_dereference_bh.
> 
> What the code seems to be doing here is in some places
> 
> 	local_bh_disable();
> 	...
> 	rcu_read_lock();
> 	rcu_dereference(rt_hash_table[h].chain);
> 	rcu_read_unlock();
> 	...
> 	local_bh_enable();
> 
> and in others
> 
> 	rcu_read_lock_bh();
> 	rcu_dereference_bh(rt_hash_table[h].chain);
> 	rcu_read_unlock_bh();
> 
> When rt_hash_table[h].chain gets the __rcu_bh annotation, we'd have to
> turn first rcu_dereference into rcu_dereference_bh in order to have a clean
> build with sparse. Would that change be
> a) correct from RCU perspective,
> b) desirable for code inspection, and
> c) lockdep-clean?
> 

Its really rcu_dereference_bh() that could/should be used:
I see no problem changing


        local_bh_disable();
        ...
        rcu_read_lock();
        rcu_dereference(rt_hash_table[h].chain);
        rcu_read_unlock();
        ...
        local_bh_enable();


to


        local_bh_disable();
        ...
        rcu_read_lock();
        rcu_dereference_bh(rt_hash_table[h].chain);
        rcu_read_unlock();
        ...
        local_bh_enable();



^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 10:39                           ` Eric Dumazet
@ 2010-03-10 10:49                             ` Herbert Xu
  2010-03-10 13:13                               ` Paul E. McKenney
                                                 ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Herbert Xu @ 2010-03-10 10:49 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Arnd Bergmann, paulmck, David S. Miller, netdev, Stephen Hemminger

On Wed, Mar 10, 2010 at 11:39:43AM +0100, Eric Dumazet wrote:
>
> Its really rcu_dereference_bh() that could/should be used:
> I see no problem changing
> 
> 
>         local_bh_disable();
>         ...
>         rcu_read_lock();
>         rcu_dereference(rt_hash_table[h].chain);
>         rcu_read_unlock();
>         ...
>         local_bh_enable();

Why don't we just ignore the bh part for rcu_dereference?

After all it's call_rcu_bh and the other primitives that we really
care about.  For rcu_dereference bh should make no difference
whatsoever.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 10:49                             ` Herbert Xu
@ 2010-03-10 13:13                               ` Paul E. McKenney
  2010-03-10 14:07                                 ` Herbert Xu
  2010-03-10 13:27                               ` Arnd Bergmann
  2010-03-10 13:39                               ` Arnd Bergmann
  2 siblings, 1 reply; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-10 13:13 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric Dumazet, Arnd Bergmann, David S. Miller, netdev, Stephen Hemminger

On Wed, Mar 10, 2010 at 06:49:07PM +0800, Herbert Xu wrote:
> On Wed, Mar 10, 2010 at 11:39:43AM +0100, Eric Dumazet wrote:
> >
> > Its really rcu_dereference_bh() that could/should be used:
> > I see no problem changing
> > 
> > 
> >         local_bh_disable();
> >         ...
> >         rcu_read_lock();
> >         rcu_dereference(rt_hash_table[h].chain);
> >         rcu_read_unlock();
> >         ...
> >         local_bh_enable();
> 
> Why don't we just ignore the bh part for rcu_dereference?
> 
> After all it's call_rcu_bh and the other primitives that we really
> care about.  For rcu_dereference bh should make no difference
> whatsoever.

If CONFIG_PROVE_RCU is set, rcu_dereference() checks for rcu_read_lock()
and rcu_dereference_bh() checks for either rcu_read_lock_bh() or BH
being disabled.  Yes, this is a bit restrictive, but there are a few too
many to check by hand these days.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10  9:41                         ` Arnd Bergmann
  2010-03-10 10:39                           ` Eric Dumazet
@ 2010-03-10 13:19                           ` Paul E. McKenney
  2010-03-10 13:30                             ` Arnd Bergmann
  1 sibling, 1 reply; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-10 13:19 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Wed, Mar 10, 2010 at 10:41:32AM +0100, Arnd Bergmann wrote:
> On Wednesday 10 March 2010 03:14:10 Paul E. McKenney wrote:
> > On Tue, Mar 09, 2010 at 10:12:59PM +0100, Arnd Bergmann wrote:
> >
> > > I've just tried annotating net/ipv4/route.c like this and did not get
> > > very far, because the same pointers are used for rcu and rcu_bh.
> > > Could you check if this is a false positive or an actual finding?
> > 
> > Hmmm...  I am only seeing a call_rcu_bh() here, so unless I am missing
> > something, this is a real problem in TREE_PREEMPT_RCU kernels.  The
> > call_rcu_bh() only interacts with the rcu_read_lock_bh() readers, not
> > the rcu_read_lock() readers.
> > 
> > One approach is to run freed blocks through both types of grace periods,
> > I suppose.
> 
> Well, if I introduce different __rcu and __rcu_bh address space annotations,
> sparse would still not like that, because then you can only pass the annotated
> pointers into either rcu_dereference or rcu_dereference_bh.
> 
> What the code seems to be doing here is in some places
> 
> 	local_bh_disable();
> 	...
> 	rcu_read_lock();
> 	rcu_dereference(rt_hash_table[h].chain);
> 	rcu_read_unlock();
> 	...
> 	local_bh_enable();
> 
> and in others
> 
> 	rcu_read_lock_bh();
> 	rcu_dereference_bh(rt_hash_table[h].chain);
> 	rcu_read_unlock_bh();

Hmmm...  This is actually legal.

> When rt_hash_table[h].chain gets the __rcu_bh annotation, we'd have to
> turn first rcu_dereference into rcu_dereference_bh in order to have a clean
> build with sparse. Would that change be
> a) correct from RCU perspective,
> b) desirable for code inspection, and
> c) lockdep-clean?

I have a patch queued up that will make rcu_dereference_bh() handle this
correctly -- current -tip and mainline would complain.  Please see below
for a sneak preview.

Thoughts?

							Thanx, Paul

rcu: make rcu_read_lock_bh_held() allow for disabled BH

Disabling BH can stand in for rcu_read_lock_bh(), and this patch updates
rcu_read_lock_bh_held() to allow for this.  In order to avoid
include-file hell, this function is moved out of line to kernel/rcupdate.c.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

include/linux/rcupdate.h |   19 ++++---------------
 kernel/rcupdate.c        |   22 ++++++++++++++++++++++
 2 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 75921b8..c393acc 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -119,22 +119,11 @@ static inline int rcu_read_lock_held(void)
 	return lock_is_held(&rcu_lock_map);
 }
 
-/**
- * rcu_read_lock_bh_held - might we be in RCU-bh read-side critical section?
- *
- * If CONFIG_PROVE_LOCKING is selected and enabled, returns nonzero iff in
- * an RCU-bh read-side critical section.  In absence of CONFIG_PROVE_LOCKING,
- * this assumes we are in an RCU-bh read-side critical section unless it can
- * prove otherwise.
- *
- * Check rcu_scheduler_active to prevent false positives during boot.
+/*
+ * rcu_read_lock_bh_held() is defined out of line to avoid #include-file
+ * hell.
  */
-static inline int rcu_read_lock_bh_held(void)
-{
-	if (!debug_lockdep_rcu_enabled())
-		return 1;
-	return lock_is_held(&rcu_bh_lock_map);
-}
+extern int rcu_read_lock_bh_held(void);
 
 /**
  * rcu_read_lock_sched_held - might we be in RCU-sched read-side critical section?
diff --git a/kernel/rcupdate.c b/kernel/rcupdate.c
index f1125c1..913eccb 100644
--- a/kernel/rcupdate.c
+++ b/kernel/rcupdate.c
@@ -45,6 +45,7 @@
 #include <linux/mutex.h>
 #include <linux/module.h>
 #include <linux/kernel_stat.h>
+#include <linux/hardirq.h>
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 static struct lock_class_key rcu_lock_key;
@@ -66,6 +67,27 @@ EXPORT_SYMBOL_GPL(rcu_sched_lock_map);
 int rcu_scheduler_active __read_mostly;
 EXPORT_SYMBOL_GPL(rcu_scheduler_active);
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+
+/**
+ * rcu_read_lock_bh_held - might we be in RCU-bh read-side critical section?
+ *
+ * Check for bottom half being disabled, which covers both the
+ * CONFIG_PROVE_RCU and not cases.  Note that if someone uses
+ * rcu_read_lock_bh(), but then later enables BH, lockdep (if enabled)
+ * will show the situation.
+ *
+ * Check debug_lockdep_rcu_enabled() to prevent false positives during boot.
+ */
+int rcu_read_lock_bh_held(void)
+{
+	if (!debug_lockdep_rcu_enabled())
+		return 1;
+	return in_softirq();
+}
+
+#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
+
 /*
  * This function is invoked towards the end of the scheduler's initialization
  * process.  Before this is called, the idle task might contain

^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 10:49                             ` Herbert Xu
  2010-03-10 13:13                               ` Paul E. McKenney
@ 2010-03-10 13:27                               ` Arnd Bergmann
  2010-03-10 13:39                               ` Arnd Bergmann
  2 siblings, 0 replies; 81+ messages in thread
From: Arnd Bergmann @ 2010-03-10 13:27 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric Dumazet, paulmck, David S. Miller, netdev, Stephen Hemminger

On Wednesday 10 March 2010, Herbert Xu wrote:
> >
> > Its really rcu_dereference_bh() that could/should be used:
> > I see no problem changing
> > 
> > 
> >         local_bh_disable();
> >         ...
> >         rcu_read_lock();
> >         rcu_dereference(rt_hash_table[h].chain);
> >         rcu_read_unlock();
> >         ...
> >         local_bh_enable();
> 
> Why don't we just ignore the bh part for rcu_dereference?
> 
> After all it's call_rcu_bh and the other primitives that we really
> care about.  For rcu_dereference bh should make no difference
> whatsoever.

To add some background on what I'm doing, I'm currently adding
new address space modifier __rcu, __rcu_bh, __rcu_sched and __srcu
to the sparse annotations along the same lines that our __iomem,
__user and __percpu annotations work [1].

In order to check all cases, I want to ensure that you can not
use any of those pointers outside of rcu_dereference* and
rcu_assign_pointer, as well as making sure that you cannot pass
a pointer without these annotations in there, so we can catch
code that uses rcu_dereference without rcu_assign_pointer or
call_rcu.

Consequently, rcu_dereference also checks that the pointer is actually
__rcu, and passing an __rcu_bh pointer in would be considered as
wrong as passing a regular pointer by sparse.

With the work that Paul has done on lockdep, rcu_dereference_bh
now also checks that bottom halves are really disabled, which is
a very useful thing to check if you want to prove that the
call_rcu is really serialized with the use of the data.

	Arnd

[1] http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=shortlog;h=refs/heads/rcu-annotate

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 13:19                           ` Paul E. McKenney
@ 2010-03-10 13:30                             ` Arnd Bergmann
  2010-03-10 13:57                               ` Paul E. McKenney
  0 siblings, 1 reply; 81+ messages in thread
From: Arnd Bergmann @ 2010-03-10 13:30 UTC (permalink / raw)
  To: paulmck; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Wednesday 10 March 2010, Paul E. McKenney wrote:
> > When rt_hash_table[h].chain gets the __rcu_bh annotation, we'd have to
> > turn first rcu_dereference into rcu_dereference_bh in order to have a clean
> > build with sparse. Would that change be
> > a) correct from RCU perspective,
> > b) desirable for code inspection, and
> > c) lockdep-clean?
> 
> I have a patch queued up that will make rcu_dereference_bh() handle this
> correctly -- current -tip and mainline would complain.  Please see below
> for a sneak preview.
> 
> Thoughts?

Ok, so that would mean we can convert it all to rcu_dereference_bh().
I guess an alternative to this would be to also change the rcu_read_lock()
inside local_bh_disable() sections to rcu_read_lock_bh(), which is not
necessary but also not harmful, right?

	Arnd

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 10:49                             ` Herbert Xu
  2010-03-10 13:13                               ` Paul E. McKenney
  2010-03-10 13:27                               ` Arnd Bergmann
@ 2010-03-10 13:39                               ` Arnd Bergmann
  2 siblings, 0 replies; 81+ messages in thread
From: Arnd Bergmann @ 2010-03-10 13:39 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric Dumazet, paulmck, David S. Miller, netdev, Stephen Hemminger

On Wednesday 10 March 2010, Herbert Xu wrote:
> >
> > Its really rcu_dereference_bh() that could/should be used:
> > I see no problem changing
> > 
> > 
> >         local_bh_disable();
> >         ...
> >         rcu_read_lock();
> >         rcu_dereference(rt_hash_table[h].chain);
> >         rcu_read_unlock();
> >         ...
> >         local_bh_enable();
> 
> Why don't we just ignore the bh part for rcu_dereference?
> 
> After all it's call_rcu_bh and the other primitives that we really
> care about.  For rcu_dereference bh should make no difference
> whatsoever.

To add some background on what I'm doing, I'm currently adding
new address space modifier __rcu, __rcu_bh, __rcu_sched and __srcu
to the sparse annotations along the same lines that our __iomem,
__user and __percpu annotations work [1].

In order to check all cases, I want to ensure that you can not
use any of those pointers outside of rcu_dereference* and
rcu_assign_pointer, as well as making sure that you cannot pass
a pointer without these annotations in there, so we can catch
code that uses rcu_dereference without rcu_assign_pointer or
call_rcu.

Consequently, rcu_dereference also checks that the pointer is actually
__rcu, and passing an __rcu_bh pointer in would be considered as
wrong as passing a regular pointer by sparse.

With the work that Paul has done on lockdep, rcu_dereference_bh
now also checks that bottom halves are really disabled, which is
a very useful thing to check if you want to prove that the
call_rcu is really serialized with the use of the data.

	Arnd

[1] http://git.kernel.org/?p=linux/kernel/git/arnd/playground.git;a=shortlog;h=refs/heads/rcu-annotate

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 13:30                             ` Arnd Bergmann
@ 2010-03-10 13:57                               ` Paul E. McKenney
  0 siblings, 0 replies; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-10 13:57 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Wed, Mar 10, 2010 at 02:30:06PM +0100, Arnd Bergmann wrote:
> On Wednesday 10 March 2010, Paul E. McKenney wrote:
> > > When rt_hash_table[h].chain gets the __rcu_bh annotation, we'd have to
> > > turn first rcu_dereference into rcu_dereference_bh in order to have a clean
> > > build with sparse. Would that change be
> > > a) correct from RCU perspective,
> > > b) desirable for code inspection, and
> > > c) lockdep-clean?
> > 
> > I have a patch queued up that will make rcu_dereference_bh() handle this
> > correctly -- current -tip and mainline would complain.  Please see below
> > for a sneak preview.
> > 
> > Thoughts?
> 
> Ok, so that would mean we can convert it all to rcu_dereference_bh().
> I guess an alternative to this would be to also change the rcu_read_lock()
> inside local_bh_disable() sections to rcu_read_lock_bh(), which is not
> necessary but also not harmful, right?

It does impose additional overhead, which the networking guys are eager
to avoid, given that network links speeds are continuing to increase.  ;-)
So moving to rcu_dereference_bh() seems better to me.

(Please note that rcu_dereference_bh() imposes checking overhead only
in the presence of both CONFIG_PROVE_LOCKING and CONFIG_PROVE_RCU.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 13:13                               ` Paul E. McKenney
@ 2010-03-10 14:07                                 ` Herbert Xu
  2010-03-10 16:26                                   ` Paul E. McKenney
  0 siblings, 1 reply; 81+ messages in thread
From: Herbert Xu @ 2010-03-10 14:07 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Eric Dumazet, Arnd Bergmann, David S. Miller, netdev, Stephen Hemminger

On Wed, Mar 10, 2010 at 05:13:18AM -0800, Paul E. McKenney wrote:
>
> If CONFIG_PROVE_RCU is set, rcu_dereference() checks for rcu_read_lock()
> and rcu_dereference_bh() checks for either rcu_read_lock_bh() or BH
> being disabled.  Yes, this is a bit restrictive, but there are a few too
> many to check by hand these days.

Fair enough.  We should get those fixed then.  In fact I reckon
most of them should be using the BH variant so we might be able
to kill a few rcu_read_lock's which would be a real gain.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 14:07                                 ` Herbert Xu
@ 2010-03-10 16:26                                   ` Paul E. McKenney
  2010-03-10 16:35                                     ` David Miller
  0 siblings, 1 reply; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-10 16:26 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric Dumazet, Arnd Bergmann, David S. Miller, netdev, Stephen Hemminger

On Wed, Mar 10, 2010 at 10:07:29PM +0800, Herbert Xu wrote:
> On Wed, Mar 10, 2010 at 05:13:18AM -0800, Paul E. McKenney wrote:
> >
> > If CONFIG_PROVE_RCU is set, rcu_dereference() checks for rcu_read_lock()
> > and rcu_dereference_bh() checks for either rcu_read_lock_bh() or BH
> > being disabled.  Yes, this is a bit restrictive, but there are a few too
> > many to check by hand these days.
> 
> Fair enough.  We should get those fixed then.  In fact I reckon
> most of them should be using the BH variant so we might be able
> to kill a few rcu_read_lock's which would be a real gain.

I have -tip commit a898def29e4119bc01ebe7ca97423181f4c0ea2d that
converts some of the rcu_dereference()s in net/core/filter.c,
net/core/dev.c, net/decnet/dn_route.c, net/packet/af_packet.c, and
net/ipv4/route.c to rcu_dereference_bh().

How should we coordinate the removal of the rcu_read_lock() calls?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 16:26                                   ` Paul E. McKenney
@ 2010-03-10 16:35                                     ` David Miller
  2010-03-10 17:56                                       ` Arnd Bergmann
  0 siblings, 1 reply; 81+ messages in thread
From: David Miller @ 2010-03-10 16:35 UTC (permalink / raw)
  To: paulmck; +Cc: herbert, eric.dumazet, arnd, netdev, shemminger

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Date: Wed, 10 Mar 2010 08:26:58 -0800

> On Wed, Mar 10, 2010 at 10:07:29PM +0800, Herbert Xu wrote:
>> On Wed, Mar 10, 2010 at 05:13:18AM -0800, Paul E. McKenney wrote:
>> >
>> > If CONFIG_PROVE_RCU is set, rcu_dereference() checks for rcu_read_lock()
>> > and rcu_dereference_bh() checks for either rcu_read_lock_bh() or BH
>> > being disabled.  Yes, this is a bit restrictive, but there are a few too
>> > many to check by hand these days.
>> 
>> Fair enough.  We should get those fixed then.  In fact I reckon
>> most of them should be using the BH variant so we might be able
>> to kill a few rcu_read_lock's which would be a real gain.
> 
> I have -tip commit a898def29e4119bc01ebe7ca97423181f4c0ea2d that
> converts some of the rcu_dereference()s in net/core/filter.c,
> net/core/dev.c, net/decnet/dn_route.c, net/packet/af_packet.c, and
> net/ipv4/route.c to rcu_dereference_bh().
> 
> How should we coordinate the removal of the rcu_read_lock() calls?

Paul if you want to do this via your tree, feel free.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 16:35                                     ` David Miller
@ 2010-03-10 17:56                                       ` Arnd Bergmann
  2010-03-10 21:25                                         ` Paul E. McKenney
  0 siblings, 1 reply; 81+ messages in thread
From: Arnd Bergmann @ 2010-03-10 17:56 UTC (permalink / raw)
  To: David Miller; +Cc: paulmck, herbert, eric.dumazet, netdev, shemminger

On Wednesday 10 March 2010, David Miller wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> Date: Wed, 10 Mar 2010 08:26:58 -0800
>
> > I have -tip commit a898def29e4119bc01ebe7ca97423181f4c0ea2d that
> > converts some of the rcu_dereference()s in net/core/filter.c,
> > net/core/dev.c, net/decnet/dn_route.c, net/packet/af_packet.c, and
> > net/ipv4/route.c to rcu_dereference_bh().
> > 
> > How should we coordinate the removal of the rcu_read_lock() calls?
> 
> Paul if you want to do this via your tree, feel free.

My feeling is that this should be combined with the annotations I'm doing,
annotating one subsystem at a time, and doing changes like these in the
process. I'm still unsure what interface extensions there will have to
be, but I guess we can the new interfaces as empty wrappers in the 2.6.34
phase, and do all of the conversions where there are potential or real
bugs.

All the other annotations can get queued in subsystem maintainer trees
where it makes sense or get put in one tree for all the others, to be
merged in after 2.6.34 is out.

	Arnd

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-10 17:56                                       ` Arnd Bergmann
@ 2010-03-10 21:25                                         ` Paul E. McKenney
  0 siblings, 0 replies; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-10 21:25 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: David Miller, herbert, eric.dumazet, netdev, shemminger

On Wed, Mar 10, 2010 at 06:56:53PM +0100, Arnd Bergmann wrote:
> On Wednesday 10 March 2010, David Miller wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > Date: Wed, 10 Mar 2010 08:26:58 -0800
> >
> > > I have -tip commit a898def29e4119bc01ebe7ca97423181f4c0ea2d that
> > > converts some of the rcu_dereference()s in net/core/filter.c,
> > > net/core/dev.c, net/decnet/dn_route.c, net/packet/af_packet.c, and
> > > net/ipv4/route.c to rcu_dereference_bh().
> > > 
> > > How should we coordinate the removal of the rcu_read_lock() calls?
> > 
> > Paul if you want to do this via your tree, feel free.
> 
> My feeling is that this should be combined with the annotations I'm doing,
> annotating one subsystem at a time, and doing changes like these in the
> process. I'm still unsure what interface extensions there will have to
> be, but I guess we can the new interfaces as empty wrappers in the 2.6.34
> phase, and do all of the conversions where there are potential or real
> bugs.
> 
> All the other annotations can get queued in subsystem maintainer trees
> where it makes sense or get put in one tree for all the others, to be
> merged in after 2.6.34 is out.

Makes sense to me -- and thank you, Arnd, for taking this on!!!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-08 18:50                     ` Arnd Bergmann
  2010-03-09  3:15                       ` Paul E. McKenney
@ 2010-03-11 18:49                       ` Arnd Bergmann
  2010-03-14 23:01                         ` Paul E. McKenney
  1 sibling, 1 reply; 81+ messages in thread
From: Arnd Bergmann @ 2010-03-11 18:49 UTC (permalink / raw)
  To: paulmck; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

Following up on the earlier discussion,

On Monday 08 March 2010, Arnd Bergmann wrote:
> > Arnd, would it be reasonable to extend your RCU-sparse changes to have
> > four different pointer namespaces, one for each flavor of RCU?  (RCU,
> > RCU-bh, RCU-sched, and SRCU)?  Always a fan of making the computer do
> > the auditing where reasonable.  ;-)
> 
> Yes, I guess that would be possible. I'd still leave out the rculist
> from any annotations for now, as this would get even more complex then.
> 
> One consequence will be the need for new rcu_assign_pointer{,_bh,_sched}
> macros that check the address space of the first argument, otherwise
> you'd be able to stick anything in there, including non-__rcu pointers.

I've tested this out now, see the patch below. I needed to add a number
of interfaces, but it still seems ok. Doing it for all the rculist
functions most likely would be less so.

This is currently the head of my rcu-annotate branch of playground.git.
Paul, before I split it up and merge this with the per-subsystem patches,
can you tell me if this is what you had in mind?

> > This could potentially catch the mismatched call_rcu()s, at least if the
> > rcu_head could be labeled.
> ...
> #define rcu_exchange_call(ptr, new, member, func) \
> ({ \
>         typeof(new) old = rcu_exchange((ptr),(new)); \
>         if (old) \
>                 call_rcu(&(old)->member, (func));       \
>         old; \
> })
 
Unfortunately, this did not work out at all. Almost every user follows
a slightly different pattern for call_rcu, so I did not find a way
to match the call_rcu calls with the pointers. In particular, the functions
calling call_rcu() sometimes no longer have access to the 'old' data,
e.g. in case of synchronize_rcu.

My current take is that static annotations won't help us here.

	Arnd

---

rcu: split up __rcu annotations
    
This adds separate name spaces for the four distinct types of RCU
that we use in the kernel, namely __rcu, __rcu_bh, __rcu_sched and
__srcu.
    
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
 arch/x86/kvm/mmu.c         |    6 ++--
 arch/x86/kvm/vmx.c         |    2 +-
 drivers/net/macvlan.c      |    8 ++--
 include/linux/compiler.h   |    6 ++++
 include/linux/kvm_host.h   |    4 +-
 include/linux/netdevice.h  |    2 +-
 include/linux/rcupdate.h   |   68 +++++++++++++++++++++++++++++++++----------
 include/linux/srcu.h       |    5 ++-
 include/net/dst.h          |    4 +-
 include/net/llc.h          |    3 +-
 include/net/sock.h         |    2 +-
 include/trace/events/kvm.h |    4 +-
 kernel/cgroup.c            |   10 +++---
 kernel/perf_event.c        |    8 ++--
 kernel/sched.c             |    6 ++--
 kernel/sched_fair.c        |    2 +-
 lib/radix-tree.c           |    8 ++--
 net/core/filter.c          |    4 +-
 net/core/sock.c            |    6 ++--
 net/decnet/dn_route.c      |    2 +-
 net/ipv4/route.c           |   60 +++++++++++++++++++-------------------
 net/ipv4/tcp.c             |    4 +-
 net/llc/llc_core.c         |    6 ++--
 net/llc/llc_input.c        |    2 +-
 virt/kvm/iommu.c           |    4 +-
 virt/kvm/kvm_main.c        |   56 +++++++++++++++++++-----------------
 26 files changed, 171 insertions(+), 121 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 741373e..45877ca 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -793,7 +793,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	int retval = 0;
 	struct kvm_memslots *slots;
 
-	slots = rcu_dereference(kvm->memslots);
+	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 
 	for (i = 0; i < slots->nmemslots; i++) {
 		struct kvm_memory_slot *memslot = &slots->memslots[i];
@@ -3007,7 +3007,7 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
 	unsigned int  nr_pages = 0;
 	struct kvm_memslots *slots;
 
-	slots = rcu_dereference(kvm->memslots);
+	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 	for (i = 0; i < slots->nmemslots; i++)
 		nr_pages += slots->memslots[i].npages;
 
@@ -3282,7 +3282,7 @@ static int count_rmaps(struct kvm_vcpu *vcpu)
 	int i, j, k, idx;
 
 	idx = srcu_read_lock(&kvm->srcu);
-	slots = rcu_dereference(kvm->memslots);
+	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 	for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
 		struct kvm_memory_slot *m = &slots->memslots[i];
 		struct kvm_rmap_desc *d;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0aec1f3..d0c82ed 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1513,7 +1513,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
 		struct kvm_memslots *slots;
 		gfn_t base_gfn;
 
-		slots = rcu_dereference(kvm->memslots);
+		slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 		base_gfn = slots->memslots[0].base_gfn +
 				 slots->memslots[0].npages - 3;
 		return base_gfn << PAGE_SHIFT;
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 95e1bcc..b958d5a 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -531,15 +531,15 @@ static int macvlan_port_create(struct net_device *dev)
 	INIT_LIST_HEAD(&port->vlans);
 	for (i = 0; i < MACVLAN_HASH_SIZE; i++)
 		INIT_HLIST_HEAD(&port->vlan_hash[i]);
-	rcu_assign_pointer(dev->macvlan_port, port);
+	rcu_assign_pointer_bh(dev->macvlan_port, port);
 	return 0;
 }
 
 static void macvlan_port_destroy(struct net_device *dev)
 {
-	struct macvlan_port *port = rcu_dereference_const(dev->macvlan_port);
+	struct macvlan_port *port = rcu_dereference_bh_const(dev->macvlan_port);
 
-	rcu_assign_pointer(dev->macvlan_port, NULL);
+	rcu_assign_pointer_bh(dev->macvlan_port, NULL);
 	synchronize_rcu();
 	kfree(port);
 }
@@ -624,7 +624,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
 		if (err < 0)
 			return err;
 	}
-	port = rcu_dereference(lowerdev->macvlan_port);
+	port = rcu_dereference_bh(lowerdev->macvlan_port);
 
 	vlan->lowerdev = lowerdev;
 	vlan->dev      = dev;
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 0ab21c2..d5756d4 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -17,6 +17,9 @@
 # define __cond_lock(x,c)	((c) ? ({ __acquire(x); 1; }) : 0)
 # define __percpu	__attribute__((noderef, address_space(3)))
 # define __rcu		__attribute__((noderef, address_space(4)))
+# define __rcu_bh	__attribute__((noderef, address_space(5)))
+# define __rcu_sched	__attribute__((noderef, address_space(6)))
+# define __srcu		__attribute__((noderef, address_space(7)))
 extern void __chk_user_ptr(const volatile void __user *);
 extern void __chk_io_ptr(const volatile void __iomem *);
 #else
@@ -36,6 +39,9 @@ extern void __chk_io_ptr(const volatile void __iomem *);
 # define __cond_lock(x,c) (c)
 # define __percpu
 # define __rcu
+# define __rcu_bh
+# define __rcu_sched
+# define __srcu
 #endif
 
 #ifdef __KERNEL__
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9eb0f9c..bad1787 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -164,7 +164,7 @@ struct kvm {
 	raw_spinlock_t requests_lock;
 	struct mutex slots_lock;
 	struct mm_struct *mm; /* userspace tied to this vm */
-	struct kvm_memslots __rcu *memslots;
+	struct kvm_memslots __srcu *memslots;
 	struct srcu_struct srcu;
 #ifdef CONFIG_KVM_APIC_ARCHITECTURE
 	u32 bsp_vcpu_id;
@@ -174,7 +174,7 @@ struct kvm {
 	atomic_t online_vcpus;
 	struct list_head vm_list;
 	struct mutex lock;
-	struct kvm_io_bus __rcu *buses[KVM_NR_BUSES];
+	struct kvm_io_bus __srcu *buses[KVM_NR_BUSES];
 #ifdef CONFIG_HAVE_KVM_EVENTFD
 	struct {
 		spinlock_t        lock;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fd7e8de..1b72188 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -949,7 +949,7 @@ struct net_device {
 	/* bridge stuff */
 	void __rcu		*br_port;
 	/* macvlan */
-	struct macvlan_port __rcu *macvlan_port;
+	struct macvlan_port __rcu_bh *macvlan_port;
 	/* GARP */
 	struct garp_port __rcu	*garp_port;
 
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 03702cc..b4c6f39 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -183,19 +183,33 @@ static inline int rcu_read_lock_sched_held(void)
  * read-side critical section.  It is also possible to check for
  * locks being held, for example, by using lockdep_is_held().
  */
-#define rcu_dereference_check(p, c) \
+#define __rcu_dereference_check(p, c, space) \
 	({ \
 		if (debug_locks && !(c)) \
 			lockdep_rcu_dereference(__FILE__, __LINE__); \
-		rcu_dereference_raw(p); \
+		__rcu_dereference_raw(p, space); \
 	})
 
+
 #else /* #ifdef CONFIG_PROVE_RCU */
 
-#define rcu_dereference_check(p, c)	rcu_dereference_raw(p)
+#define __rcu_dereference_check(p, c, space)	\
+	__rcu_dereference_raw(p, space)
 
 #endif /* #else #ifdef CONFIG_PROVE_RCU */
 
+#define rcu_dereference_check(p, c) \
+	__rcu_dereference_check(p, c, __rcu)
+
+#define rcu_dereference_bh_check(p, c) \
+	__rcu_dereference_check(p, rcu_read_lock_bh_held() || (c), __rcu_bh)
+
+#define rcu_dereference_sched_check(p, c) \
+	__rcu_dereference_check(p, rcu_read_lock_sched_held() || (c), __rcu_sched)
+
+#define srcu_dereference_check(p, c) \
+	__rcu_dereference_check(p, srcu_read_lock_held() || (c), __srcu)
+
 /**
  * rcu_read_lock - mark the beginning of an RCU read-side critical section.
  *
@@ -341,13 +355,15 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * exactly which pointers are protected by RCU and checks that
  * the pointer is annotated as __rcu.
  */
-#define rcu_dereference_raw(p)  ({ \
+#define __rcu_dereference_raw(p, space)  ({ \
 				typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
-				(void) (((typeof (*p) __rcu *)p) == p); \
+				(void) (((typeof (*p) space *)p) == p); \
 				smp_read_barrier_depends(); \
 				((typeof(*p) __force __kernel *)(_________p1)); \
 				})
 
+#define rcu_dereference_raw(p) __rcu_dereference_raw(p, __rcu)
+
 /**
  * rcu_dereference_const - fetch an __rcu pointer outside of a
  * read-side critical section.
@@ -360,18 +376,22 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * or in an RCU call.
  */
 
-#define rcu_dereference_const(p)     ({ \
-				(void) (((typeof (*p) __rcu *)p) == p); \
+#define __rcu_dereference_const(p, space)     ({ \
+				(void) (((typeof (*p) space *)p) == p); \
 				((typeof(*p) __force __kernel *)(p)); \
 				})
 
+#define rcu_dereference_const(p)  __rcu_dereference_const(p, __rcu)
+#define rcu_dereference_bh_const(p)  __rcu_dereference_const(p, __rcu_bh)
+#define rcu_dereference_sched_const(p)  __rcu_dereference_const(p, __rcu_sched)
+
 /**
  * rcu_dereference - fetch an RCU-protected pointer, checking for RCU
  *
  * Makes rcu_dereference_check() do the dirty work.
  */
 #define rcu_dereference(p) \
-	rcu_dereference_check(p, rcu_read_lock_held())
+	__rcu_dereference_check(p, rcu_read_lock_held(), __rcu)
 
 /**
  * rcu_dereference_bh - fetch an RCU-protected pointer, checking for RCU-bh
@@ -379,7 +399,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * Makes rcu_dereference_check() do the dirty work.
  */
 #define rcu_dereference_bh(p) \
-		rcu_dereference_check(p, rcu_read_lock_bh_held())
+	__rcu_dereference_check(p, rcu_read_lock_bh_held(), __rcu_bh)
 
 /**
  * rcu_dereference_sched - fetch RCU-protected pointer, checking for RCU-sched
@@ -387,7 +407,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * Makes rcu_dereference_check() do the dirty work.
  */
 #define rcu_dereference_sched(p) \
-		rcu_dereference_check(p, rcu_read_lock_sched_held())
+	__rcu_dereference_check(p, rcu_read_lock_sched_held(), __rcu_sched)
 
 /**
  * rcu_assign_pointer - assign (publicize) a pointer to a newly
@@ -402,12 +422,12 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * code.
  */
 
-#define rcu_assign_pointer(p, v) \
+#define __rcu_assign_pointer(p, v, space) \
 	({ \
 		if (!__builtin_constant_p(v) || \
 		    ((v) != NULL)) \
 			smp_wmb(); \
-		(p) = (typeof(*v) __force __rcu *)(v); \
+		(p) = (typeof(*v) __force space *)(v); \
 	})
 
 /**
@@ -415,10 +435,17 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
  * without barriers.
  * Using this is almost always a bug.
  */
-#define __rcu_assign_pointer(p, v) \
-	({ \
-		(p) = (typeof(*v) __force __rcu *)(v); \
-	})
+#define rcu_assign_pointer(p, v) \
+	__rcu_assign_pointer(p, v, __rcu)
+
+#define rcu_assign_pointer_bh(p, v) \
+	__rcu_assign_pointer(p, v, __rcu_bh)
+
+#define rcu_assign_pointer_sched(p, v) \
+	__rcu_assign_pointer(p, v, __rcu_sched)
+
+#define srcu_assign_pointer(p, v) \
+	__rcu_assign_pointer(p, v, __srcu)
 
 /**
  * RCU_INIT_POINTER - initialize an RCU protected member
@@ -427,6 +454,15 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
 #define RCU_INIT_POINTER(p, v) \
 		p = (typeof(*v) __force __rcu *)(v)
 
+#define RCU_INIT_POINTER_BH(p, v) \
+		p = (typeof(*v) __force __rcu_bh *)(v)
+
+#define RCU_INIT_POINTER_SCHED(p, v) \
+		p = (typeof(*v) __force __rcu_sched *)(v)
+
+#define SRCU_INIT_POINTER(p, v) \
+		p = (typeof(*v) __force __srcu *)(v)
+
 /* Infrastructure to implement the synchronize_() primitives. */
 
 struct rcu_synchronize {
diff --git a/include/linux/srcu.h b/include/linux/srcu.h
index 4d5ecb2..feaf661 100644
--- a/include/linux/srcu.h
+++ b/include/linux/srcu.h
@@ -111,7 +111,10 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
  * Makes rcu_dereference_check() do the dirty work.
  */
 #define srcu_dereference(p, sp) \
-		rcu_dereference_check(p, srcu_read_lock_held(sp))
+		__rcu_dereference_check(p, srcu_read_lock_held(sp), __srcu)
+
+#define srcu_dereference_const(p) \
+		__rcu_dereference_const(p, __srcu)
 
 /**
  * srcu_read_lock - register a new reader for an SRCU-protected structure.
diff --git a/include/net/dst.h b/include/net/dst.h
index 5f839aa..bbeaba2 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -94,9 +94,9 @@ struct dst_entry {
 	unsigned long		lastuse;
 	union {
 		struct dst_entry *next;
-		struct rtable   __rcu *rt_next;
+		struct rtable   __rcu_bh *rt_next;
 		struct rt6_info   *rt6_next;
-		struct dn_route  *dn_next;
+		struct dn_route  __rcu_bh *dn_next;
 	};
 };
 
diff --git a/include/net/llc.h b/include/net/llc.h
index 8299cb2..5700082 100644
--- a/include/net/llc.h
+++ b/include/net/llc.h
@@ -59,7 +59,8 @@ struct llc_sap {
 	int		 (* rcv_func)(struct sk_buff *skb,
 				     struct net_device *dev,
 				     struct packet_type *pt,
-				     struct net_device *orig_dev) __rcu;
+				     struct net_device *orig_dev)
+							 __rcu_bh;
 	struct llc_addr	 laddr;
 	struct list_head node;
 	spinlock_t sk_lock;
diff --git a/include/net/sock.h b/include/net/sock.h
index e07cd78..66d5e09 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -290,7 +290,7 @@ struct sock {
 	struct ucred		sk_peercred;
 	long			sk_rcvtimeo;
 	long			sk_sndtimeo;
-	struct sk_filter __rcu	*sk_filter;
+	struct sk_filter __rcu_bh *sk_filter;
 	void			*sk_protinfo;
 	struct timer_list	sk_timer;
 	ktime_t			sk_stamp;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index fc45694..db3e502 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1392,7 +1392,7 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
 		root_count++;
 
 		sb->s_root->d_fsdata = root_cgrp;
-		__rcu_assign_pointer(root->top_cgroup.dentry, sb->s_root);
+		rcu_assign_pointer(root->top_cgroup.dentry, sb->s_root);
 
 		/* Link the top cgroup in this hierarchy into all
 		 * the css_set objects */
@@ -3243,7 +3243,7 @@ int __init cgroup_init_early(void)
 	css_set_count = 1;
 	init_cgroup_root(&rootnode);
 	root_count = 1;
-	__rcu_assign_pointer(init_task.cgroups, &init_css_set);
+	rcu_assign_pointer(init_task.cgroups, &init_css_set);
 
 	init_css_set_link.cg = &init_css_set;
 	init_css_set_link.cgrp = dummytop;
@@ -3551,7 +3551,7 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
 	/* Reassign the task to the init_css_set. */
 	task_lock(tsk);
 	cg = rcu_dereference_const(tsk->cgroups);
-	__rcu_assign_pointer(tsk->cgroups, &init_css_set);
+	rcu_assign_pointer(tsk->cgroups, &init_css_set);
 	task_unlock(tsk);
 	if (cg)
 		put_css_set_taskexit(cg);
@@ -3959,8 +3959,8 @@ static int __init cgroup_subsys_init_idr(struct cgroup_subsys *ss)
 		return PTR_ERR(newid);
 
 	newid->stack[0] = newid->id;
-	__rcu_assign_pointer(newid->css, rootcss);
-	__rcu_assign_pointer(rootcss->id, newid);
+	rcu_assign_pointer(newid->css, rootcss);
+	rcu_assign_pointer(rootcss->id, newid);
 	return 0;
 }
 
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index ac8bcbd..e1b65b2 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -1223,8 +1223,8 @@ void perf_event_task_sched_out(struct task_struct *task,
 			 * XXX do we need a memory barrier of sorts
 			 * wrt to rcu_dereference() of perf_event_ctxp
 			 */
-			__rcu_assign_pointer(task->perf_event_ctxp, next_ctx);
-			__rcu_assign_pointer(next->perf_event_ctxp, ctx);
+			rcu_assign_pointer(task->perf_event_ctxp, next_ctx);
+			rcu_assign_pointer(next->perf_event_ctxp, ctx);
 			ctx->task = next;
 			next_ctx->task = task;
 			do_switch = 0;
@@ -5376,10 +5376,10 @@ int perf_event_init_task(struct task_struct *child)
 		 */
 		cloned_ctx = rcu_dereference(parent_ctx->parent_ctx);
 		if (cloned_ctx) {
-			__rcu_assign_pointer(child_ctx->parent_ctx, cloned_ctx);
+			rcu_assign_pointer(child_ctx->parent_ctx, cloned_ctx);
 			child_ctx->parent_gen = parent_ctx->parent_gen;
 		} else {
-			__rcu_assign_pointer(child_ctx->parent_ctx, parent_ctx);
+			rcu_assign_pointer(child_ctx->parent_ctx, parent_ctx);
 			child_ctx->parent_gen = parent_ctx->generation;
 		}
 		get_ctx(rcu_dereference_const(child_ctx->parent_ctx));
diff --git a/kernel/sched.c b/kernel/sched.c
index 05fd61e..83744d6 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -528,7 +528,7 @@ struct rq {
 
 #ifdef CONFIG_SMP
 	struct root_domain *rd;
-	struct sched_domain __rcu *sd;
+	struct sched_domain __rcu_sched *sd;
 
 	unsigned char idle_at_tick;
 	/* For active balancing */
@@ -603,7 +603,7 @@ static inline int cpu_of(struct rq *rq)
 }
 
 #define rcu_dereference_check_sched_domain(p) \
-	rcu_dereference_check((p), \
+	rcu_dereference_sched_check((p), \
 			      rcu_read_lock_sched_held() || \
 			      lockdep_is_held(&sched_domains_mutex))
 
@@ -6323,7 +6323,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
 	sched_domain_debug(sd, cpu);
 
 	rq_attach_root(rq, rd);
-	rcu_assign_pointer(rq->sd, sd);
+	rcu_assign_pointer_sched(rq->sd, sd);
 }
 
 /* cpus with isolated domains */
diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 3e1fd96..5a5ea2c 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3476,7 +3476,7 @@ static void run_rebalance_domains(struct softirq_action *h)
 
 static inline int on_null_domain(int cpu)
 {
-	return !rcu_dereference(cpu_rq(cpu)->sd);
+	return !rcu_dereference_sched(cpu_rq(cpu)->sd);
 }
 
 /*
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index f6ae74c..4c6f149 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -264,7 +264,7 @@ static int radix_tree_extend(struct radix_tree_root *root, unsigned long index)
 			return -ENOMEM;
 
 		/* Increase the height.  */
-		__rcu_assign_pointer(node->slots[0],
+		rcu_assign_pointer(node->slots[0],
 			radix_tree_indirect_to_ptr(rcu_dereference_const(root->rnode)));
 
 		/* Propagate the aggregated tag info into the new root */
@@ -1090,7 +1090,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root)
 		newptr = rcu_dereference_const(to_free->slots[0]);
 		if (root->height > 1)
 			newptr = radix_tree_ptr_to_indirect(newptr);
-		__rcu_assign_pointer(root->rnode, newptr);
+		rcu_assign_pointer(root->rnode, newptr);
 		root->height--;
 		radix_tree_node_free(to_free);
 	}
@@ -1125,7 +1125,7 @@ void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)
 	slot = rcu_dereference_const(root->rnode);
 	if (height == 0) {
 		root_tag_clear_all(root);
-		__rcu_assign_pointer(root->rnode, NULL);
+		rcu_assign_pointer(root->rnode, NULL);
 		goto out;
 	}
 	slot = radix_tree_indirect_to_ptr(slot);
@@ -1183,7 +1183,7 @@ void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)
 	}
 	root_tag_clear_all(root);
 	root->height = 0;
-	__rcu_assign_pointer(root->rnode, NULL);
+	rcu_assign_pointer(root->rnode, NULL);
 	if (to_free)
 		radix_tree_node_free(to_free);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index d38ef7f..b88675b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -522,7 +522,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
 
 	rcu_read_lock_bh();
 	old_fp = rcu_dereference_bh(sk->sk_filter);
-	rcu_assign_pointer(sk->sk_filter, fp);
+	rcu_assign_pointer_bh(sk->sk_filter, fp);
 	rcu_read_unlock_bh();
 
 	if (old_fp)
@@ -539,7 +539,7 @@ int sk_detach_filter(struct sock *sk)
 	rcu_read_lock_bh();
 	filter = rcu_dereference_bh(sk->sk_filter);
 	if (filter) {
-		rcu_assign_pointer(sk->sk_filter, NULL);
+		rcu_assign_pointer_bh(sk->sk_filter, NULL);
 		sk_filter_delayed_uncharge(sk, filter);
 		ret = 0;
 	}
diff --git a/net/core/sock.c b/net/core/sock.c
index 74242e2..8549387 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1073,11 +1073,11 @@ static void __sk_free(struct sock *sk)
 	if (sk->sk_destruct)
 		sk->sk_destruct(sk);
 
-	filter = rcu_dereference_check(sk->sk_filter,
+	filter = rcu_dereference_bh_check(sk->sk_filter,
 				       atomic_read(&sk->sk_wmem_alloc) == 0);
 	if (filter) {
 		sk_filter_uncharge(sk, filter);
-		rcu_assign_pointer(sk->sk_filter, NULL);
+		rcu_assign_pointer_bh(sk->sk_filter, NULL);
 	}
 
 	sock_disable_timestamp(sk, SOCK_TIMESTAMP);
@@ -1167,7 +1167,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
 		sock_reset_flag(newsk, SOCK_DONE);
 		skb_queue_head_init(&newsk->sk_error_queue);
 
-		filter = rcu_dereference_const(newsk->sk_filter);
+		filter = rcu_dereference_bh_const(newsk->sk_filter);
 		if (filter != NULL)
 			sk_filter_charge(newsk, filter);
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index a7bf03c..22ec1d1 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -92,7 +92,7 @@
 
 struct dn_rt_hash_bucket
 {
-	struct dn_route *chain;
+	struct dn_route __rcu_bh *chain;
 	spinlock_t lock;
 };
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 37bf0d9..99cef80 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -200,7 +200,7 @@ const __u8 ip_tos2prio[16] = {
  */
 
 struct rt_hash_bucket {
-	struct rtable __rcu *chain;
+	struct rtable __rcu_bh *chain;
 };
 
 #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || \
@@ -731,26 +731,26 @@ static void rt_do_flush(int process_context)
 		spin_lock_bh(rt_hash_lock_addr(i));
 #ifdef CONFIG_NET_NS
 		{
-		struct rtable __rcu ** prev;
+		struct rtable __rcu_bh ** prev;
 		struct rtable * p;
 
-		rth = rcu_dereference_const(rt_hash_table[i].chain);
+		rth = rcu_dereference_bh(rt_hash_table[i].chain);
 
 		/* defer releasing the head of the list after spin_unlock */
-		for (tail = rth; tail; tail = rcu_dereference_const(tail->u.dst.rt_next))
+		for (tail = rth; tail; tail = rcu_dereference_bh(tail->u.dst.rt_next))
 			if (!rt_is_expired(tail))
 				break;
 		if (rth != tail)
-			__rcu_assign_pointer(rt_hash_table[i].chain, tail);
+			rcu_assign_pointer_bh(rt_hash_table[i].chain, tail);
 
 		/* call rt_free on entries after the tail requiring flush */
 		prev = &rt_hash_table[i].chain;
-		for (p = rcu_dereference_const(*prev); p; p = next) {
-			next = rcu_dereference_const(p->u.dst.rt_next);
+		for (p = rcu_dereference_bh(*prev); p; p = next) {
+			next = rcu_dereference_bh(p->u.dst.rt_next);
 			if (!rt_is_expired(p)) {
 				prev = &p->u.dst.rt_next;
 			} else {
-				__rcu_assign_pointer(*prev, next);
+				rcu_assign_pointer_bh(*prev, next);
 				rt_free(p);
 			}
 		}
@@ -763,7 +763,7 @@ static void rt_do_flush(int process_context)
 		spin_unlock_bh(rt_hash_lock_addr(i));
 
 		for (; rth != tail; rth = next) {
-			next = rcu_dereference_const(rth->u.dst.rt_next);
+			next = rcu_dereference_bh(rth->u.dst.rt_next);
 			rt_free(rth);
 		}
 	}
@@ -785,7 +785,7 @@ static void rt_check_expire(void)
 	static unsigned int rover;
 	unsigned int i = rover, goal;
 	struct rtable *rth, *aux;
-	struct rtable __rcu **rthp;
+	struct rtable __rcu_bh **rthp;
 	unsigned long samples = 0;
 	unsigned long sum = 0, sum2 = 0;
 	unsigned long delta;
@@ -815,8 +815,8 @@ static void rt_check_expire(void)
 			continue;
 		length = 0;
 		spin_lock_bh(rt_hash_lock_addr(i));
-		while ((rth = rcu_dereference_const(*rthp)) != NULL) {
-			prefetch(rcu_dereference_const(rth->u.dst.rt_next));
+		while ((rth = rcu_dereference_bh(*rthp)) != NULL) {
+			prefetch(rcu_dereference_bh(rth->u.dst.rt_next));
 			if (rt_is_expired(rth)) {
 				*rthp = rth->u.dst.rt_next;
 				rt_free(rth);
@@ -836,14 +836,14 @@ nofree:
 					 * attributes don't unfairly skew
 					 * the length computation
 					 */
-					for (aux = rcu_dereference_const(rt_hash_table[i].chain);;) {
+					for (aux = rcu_dereference_bh(rt_hash_table[i].chain);;) {
 						if (aux == rth) {
 							length += ONE;
 							break;
 						}
 						if (compare_hash_inputs(&aux->fl, &rth->fl))
 							break;
-						aux = rcu_dereference_const(aux->u.dst.rt_next);
+						aux = rcu_dereference_bh(aux->u.dst.rt_next);
 					}
 					continue;
 				}
@@ -959,7 +959,7 @@ static int rt_garbage_collect(struct dst_ops *ops)
 	static int rover;
 	static int equilibrium;
 	struct rtable *rth;
-	struct rtable __rcu **rthp;
+	struct rtable __rcu_bh **rthp;
 	unsigned long now = jiffies;
 	int goal;
 
@@ -1012,7 +1012,7 @@ static int rt_garbage_collect(struct dst_ops *ops)
 			k = (k + 1) & rt_hash_mask;
 			rthp = &rt_hash_table[k].chain;
 			spin_lock_bh(rt_hash_lock_addr(k));
-			while ((rth = rcu_dereference_const(*rthp)) != NULL) {
+			while ((rth = rcu_dereference_bh(*rthp)) != NULL) {
 				if (!rt_is_expired(rth) &&
 					!rt_may_expire(rth, tmo, expire)) {
 					tmo >>= 1;
@@ -1079,10 +1079,10 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt,
 			  struct rtable **rp, struct sk_buff *skb)
 {
 	struct rtable	*rth;
-	struct rtable __rcu **rthp;
+	struct rtable __rcu_bh **rthp;
 	unsigned long	now;
 	struct rtable *cand;
-	struct rtable __rcu **candp;
+	struct rtable __rcu_bh **candp;
 	u32 		min_score;
 	int		chain_length;
 	int attempts = !in_softirq();
@@ -1129,7 +1129,7 @@ restart:
 	rthp = &rt_hash_table[hash].chain;
 
 	spin_lock_bh(rt_hash_lock_addr(hash));
-	while ((rth = rcu_dereference_const(*rthp)) != NULL) {
+	while ((rth = rcu_dereference_bh(*rthp)) != NULL) {
 		if (rt_is_expired(rth)) {
 			*rthp = rth->u.dst.rt_next;
 			rt_free(rth);
@@ -1143,13 +1143,13 @@ restart:
 			 * must be visible to another weakly ordered CPU before
 			 * the insertion at the start of the hash chain.
 			 */
-			rcu_assign_pointer(rth->u.dst.rt_next,
+			rcu_assign_pointer_bh(rth->u.dst.rt_next,
 					   rt_hash_table[hash].chain);
 			/*
 			 * Since lookup is lockfree, the update writes
 			 * must be ordered for consistency on SMP.
 			 */
-			rcu_assign_pointer(rt_hash_table[hash].chain, rth);
+			rcu_assign_pointer_bh(rt_hash_table[hash].chain, rth);
 
 			dst_use(&rth->u.dst, now);
 			spin_unlock_bh(rt_hash_lock_addr(hash));
@@ -1252,7 +1252,7 @@ restart:
 	 * previous writes to rt are comitted to memory
 	 * before making rt visible to other CPUS.
 	 */
-	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
+	rcu_assign_pointer_bh(rt_hash_table[hash].chain, rt);
 
 	spin_unlock_bh(rt_hash_lock_addr(hash));
 
@@ -1325,13 +1325,13 @@ void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst, int more)
 
 static void rt_del(unsigned hash, struct rtable *rt)
 {
-	struct rtable __rcu **rthp;
+	struct rtable __rcu_bh **rthp;
 	struct rtable *aux;
 
 	rthp = &rt_hash_table[hash].chain;
 	spin_lock_bh(rt_hash_lock_addr(hash));
 	ip_rt_put(rt);
-	while ((aux = rcu_dereference_const(*rthp)) != NULL) {
+	while ((aux = rcu_dereference_bh(*rthp)) != NULL) {
 		if (aux == rt || rt_is_expired(aux)) {
 			*rthp = aux->u.dst.rt_next;
 			rt_free(aux);
@@ -1348,7 +1348,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 	int i, k;
 	struct in_device *in_dev = in_dev_get(dev);
 	struct rtable *rth;
-	struct rtable __rcu **rthp;
+	struct rtable __rcu_bh **rthp;
 	__be32  skeys[2] = { saddr, 0 };
 	int  ikeys[2] = { dev->ifindex, 0 };
 	struct netevent_redirect netevent;
@@ -1384,7 +1384,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 			rthp=&rt_hash_table[hash].chain;
 
 			rcu_read_lock();
-			while ((rth = rcu_dereference(*rthp)) != NULL) {
+			while ((rth = rcu_dereference_bh(*rthp)) != NULL) {
 				struct rtable *rt;
 
 				if (rth->fl.fl4_dst != daddr ||
@@ -1646,8 +1646,8 @@ unsigned short ip_rt_frag_needed(struct net *net, struct iphdr *iph,
 						rt_genid(net));
 
 			rcu_read_lock();
-			for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
-			     rth = rcu_dereference(rth->u.dst.rt_next)) {
+			for (rth = rcu_dereference_bh(rt_hash_table[hash].chain); rth;
+			     rth = rcu_dereference_bh(rth->u.dst.rt_next)) {
 				unsigned short mtu = new_mtu;
 
 				if (rth->fl.fl4_dst != daddr ||
@@ -2287,8 +2287,8 @@ int ip_route_input(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	hash = rt_hash(daddr, saddr, iif, rt_genid(net));
 
 	rcu_read_lock();
-	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
-	     rth = rcu_dereference(rth->u.dst.rt_next)) {
+	for (rth = rcu_dereference_bh(rt_hash_table[hash].chain); rth;
+	     rth = rcu_dereference_bh(rth->u.dst.rt_next)) {
 		if (((rth->fl.fl4_dst ^ daddr) |
 		     (rth->fl.fl4_src ^ saddr) |
 		     (rth->fl.iif ^ iif) |
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d8ce05b..003d54f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3252,9 +3252,9 @@ void __init tcp_init(void)
 	memset(&tcp_secret_two.secrets[0], 0, sizeof(tcp_secret_two.secrets));
 	tcp_secret_one.expires = jiffy; /* past due */
 	tcp_secret_two.expires = jiffy; /* past due */
-	__rcu_assign_pointer(tcp_secret_generating, &tcp_secret_one);
+	rcu_assign_pointer(tcp_secret_generating, &tcp_secret_one);
 	tcp_secret_primary = &tcp_secret_one;
-	__rcu_assign_pointer(tcp_secret_retiring, &tcp_secret_two);
+	rcu_assign_pointer(tcp_secret_retiring, &tcp_secret_two);
 	tcp_secret_secondary = &tcp_secret_two;
 }
 
diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c
index ed7f424..8696677 100644
--- a/net/llc/llc_core.c
+++ b/net/llc/llc_core.c
@@ -50,7 +50,7 @@ static struct llc_sap *__llc_sap_find(unsigned char sap_value)
 {
 	struct llc_sap* sap;
 
-	list_for_each_entry(sap, &llc_sap_list, node)
+	list_for_each_entry_rcu(sap, &llc_sap_list, node)
 		if (sap->laddr.lsap == sap_value)
 			goto out;
 	sap = NULL;
@@ -103,7 +103,7 @@ struct llc_sap *llc_sap_open(unsigned char lsap,
 	if (!sap)
 		goto out;
 	sap->laddr.lsap = lsap;
-	rcu_assign_pointer(sap->rcv_func, func);
+	rcu_assign_pointer_bh(sap->rcv_func, func);
 	list_add_tail_rcu(&sap->node, &llc_sap_list);
 out:
 	spin_unlock_bh(&llc_sap_list_lock);
@@ -127,7 +127,7 @@ void llc_sap_close(struct llc_sap *sap)
 	list_del_rcu(&sap->node);
 	spin_unlock_bh(&llc_sap_list_lock);
 
-	synchronize_rcu();
+	synchronize_rcu_bh();
 
 	kfree(sap);
 }
diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c
index 57ad974..b775530 100644
--- a/net/llc/llc_input.c
+++ b/net/llc/llc_input.c
@@ -179,7 +179,7 @@ int llc_rcv(struct sk_buff *skb, struct net_device *dev,
 	 * First the upper layer protocols that don't need the full
 	 * LLC functionality
 	 */
-	rcv = rcu_dereference(sap->rcv_func);
+	rcv = rcu_dereference_bh(sap->rcv_func);
 	if (rcv) {
 		struct sk_buff *cskb = skb_clone(skb, GFP_ATOMIC);
 		if (cskb)
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 80fd3ad..2ba7048 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -78,7 +78,7 @@ static int kvm_iommu_map_memslots(struct kvm *kvm)
 	int i, r = 0;
 	struct kvm_memslots *slots;
 
-	slots = rcu_dereference(kvm->memslots);
+	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 
 	for (i = 0; i < slots->nmemslots; i++) {
 		r = kvm_iommu_map_pages(kvm, &slots->memslots[i]);
@@ -217,7 +217,7 @@ static int kvm_iommu_unmap_memslots(struct kvm *kvm)
 	int i;
 	struct kvm_memslots *slots;
 
-	slots = rcu_dereference(kvm->memslots);
+	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 
 	for (i = 0; i < slots->nmemslots; i++) {
 		kvm_iommu_put_pages(kvm, slots->memslots[i].base_gfn,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 548f925..ae28c71 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -372,6 +372,8 @@ static struct kvm *kvm_create_vm(void)
 {
 	int r = 0, i;
 	struct kvm *kvm = kvm_arch_create_vm();
+	struct kvm_io_bus *buses[KVM_NR_BUSES];
+	struct kvm_memslots *memslots;
 
 	if (IS_ERR(kvm))
 		goto out;
@@ -386,14 +388,15 @@ static struct kvm *kvm_create_vm(void)
 #endif
 
 	r = -ENOMEM;
-	kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+	memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
+	srcu_assign_pointer(kvm->memslots, memslots);
 	if (!kvm->memslots)
 		goto out_err;
 	if (init_srcu_struct(&kvm->srcu))
 		goto out_err;
 	for (i = 0; i < KVM_NR_BUSES; i++) {
-		kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus),
-					GFP_KERNEL);
+		buses[i] = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL);
+		srcu_assign_pointer(kvm->buses[i], buses[i]);
 		if (!kvm->buses[i]) {
 			cleanup_srcu_struct(&kvm->srcu);
 			goto out_err;
@@ -428,8 +431,8 @@ out_err:
 	hardware_disable_all();
 out_err_nodisable:
 	for (i = 0; i < KVM_NR_BUSES; i++)
-		kfree(kvm->buses[i]);
-	kfree(kvm->memslots);
+		kfree(buses[i]);
+	kfree(memslots);
 	kfree(kvm);
 	return ERR_PTR(r);
 }
@@ -464,12 +467,12 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
 void kvm_free_physmem(struct kvm *kvm)
 {
 	int i;
-	struct kvm_memslots *slots = kvm->memslots;
+	struct kvm_memslots *slots = srcu_dereference_const(kvm->memslots);
 
 	for (i = 0; i < slots->nmemslots; ++i)
 		kvm_free_physmem_slot(&slots->memslots[i], NULL);
 
-	kfree(kvm->memslots);
+	kfree(slots);
 }
 
 static void kvm_destroy_vm(struct kvm *kvm)
@@ -483,7 +486,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	spin_unlock(&kvm_lock);
 	kvm_free_irq_routing(kvm);
 	for (i = 0; i < KVM_NR_BUSES; i++)
-		kvm_io_bus_destroy(kvm->buses[i]);
+		kvm_io_bus_destroy(srcu_dereference_const(kvm->buses[i]));
 	kvm_coalesced_mmio_free(kvm);
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
 	mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm);
@@ -552,7 +555,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
 		goto out;
 
-	memslot = &kvm->memslots->memslots[mem->slot];
+	old_memslots = srcu_dereference(kvm->memslots, &kvm->srcu);
+	memslot = &old_memslots->memslots[mem->slot];
 	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
 	npages = mem->memory_size >> PAGE_SHIFT;
 
@@ -573,7 +577,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
 	/* Check for overlaps */
 	r = -EEXIST;
 	for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
-		struct kvm_memory_slot *s = &kvm->memslots->memslots[i];
+		struct kvm_memory_slot *s = &old_memslots->memslots[i];
 
 		if (s == memslot || !s->npages)
 			continue;
@@ -669,13 +673,13 @@ skip_lpage:
 		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 		if (!slots)
 			goto out_free;
-		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
+		old_memslots = srcu_dereference_const(kvm->memslots);
+		memcpy(slots, old_memslots, sizeof(struct kvm_memslots));
 		if (mem->slot >= slots->nmemslots)
 			slots->nmemslots = mem->slot + 1;
 		slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
 
-		old_memslots = kvm->memslots;
-		rcu_assign_pointer(kvm->memslots, slots);
+		srcu_assign_pointer(kvm->memslots, slots);
 		synchronize_srcu_expedited(&kvm->srcu);
 		/* From this point no new shadow pages pointing to a deleted
 		 * memslot will be created.
@@ -705,7 +709,8 @@ skip_lpage:
 	slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
 	if (!slots)
 		goto out_free;
-	memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
+	old_memslots = srcu_dereference_const(kvm->memslots);
+	memcpy(slots, old_memslots, sizeof(struct kvm_memslots));
 	if (mem->slot >= slots->nmemslots)
 		slots->nmemslots = mem->slot + 1;
 
@@ -718,8 +723,7 @@ skip_lpage:
 	}
 
 	slots->memslots[mem->slot] = new;
-	old_memslots = kvm->memslots;
-	rcu_assign_pointer(kvm->memslots, slots);
+	srcu_assign_pointer(kvm->memslots, slots);
 	synchronize_srcu_expedited(&kvm->srcu);
 
 	kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
@@ -775,7 +779,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
 	if (log->slot >= KVM_MEMORY_SLOTS)
 		goto out;
 
-	memslot = &kvm->memslots->memslots[log->slot];
+	memslot = &srcu_dereference(kvm->memslots, &kvm->srcu)->memslots[log->slot];
 	r = -ENOENT;
 	if (!memslot->dirty_bitmap)
 		goto out;
@@ -829,7 +833,7 @@ EXPORT_SYMBOL_GPL(kvm_is_error_hva);
 struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn)
 {
 	int i;
-	struct kvm_memslots *slots = rcu_dereference(kvm->memslots);
+	struct kvm_memslots *slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 
 	for (i = 0; i < slots->nmemslots; ++i) {
 		struct kvm_memory_slot *memslot = &slots->memslots[i];
@@ -851,7 +855,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
 int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 {
 	int i;
-	struct kvm_memslots *slots = rcu_dereference(kvm->memslots);
+	struct kvm_memslots *slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 
 	gfn = unalias_gfn_instantiation(kvm, gfn);
 	for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
@@ -895,7 +899,7 @@ out:
 int memslot_id(struct kvm *kvm, gfn_t gfn)
 {
 	int i;
-	struct kvm_memslots *slots = rcu_dereference(kvm->memslots);
+	struct kvm_memslots *slots = srcu_dereference(kvm->memslots, &kvm->srcu);
 	struct kvm_memory_slot *memslot = NULL;
 
 	gfn = unalias_gfn(kvm, gfn);
@@ -1984,7 +1988,7 @@ int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
 		     int len, const void *val)
 {
 	int i;
-	struct kvm_io_bus *bus = rcu_dereference(kvm->buses[bus_idx]);
+	struct kvm_io_bus *bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
 	for (i = 0; i < bus->dev_count; i++)
 		if (!kvm_iodevice_write(bus->devs[i], addr, len, val))
 			return 0;
@@ -1996,7 +2000,7 @@ int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
 		    int len, void *val)
 {
 	int i;
-	struct kvm_io_bus *bus = rcu_dereference(kvm->buses[bus_idx]);
+	struct kvm_io_bus *bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
 
 	for (i = 0; i < bus->dev_count; i++)
 		if (!kvm_iodevice_read(bus->devs[i], addr, len, val))
@@ -2010,7 +2014,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 {
 	struct kvm_io_bus *new_bus, *bus;
 
-	bus = kvm->buses[bus_idx];
+	bus = srcu_dereference_const(kvm->buses[bus_idx]);
 	if (bus->dev_count > NR_IOBUS_DEVS-1)
 		return -ENOSPC;
 
@@ -2019,7 +2023,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 		return -ENOMEM;
 	memcpy(new_bus, bus, sizeof(struct kvm_io_bus));
 	new_bus->devs[new_bus->dev_count++] = dev;
-	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
+	srcu_assign_pointer(kvm->buses[bus_idx], new_bus);
 	synchronize_srcu_expedited(&kvm->srcu);
 	kfree(bus);
 
@@ -2037,7 +2041,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 	if (!new_bus)
 		return -ENOMEM;
 
-	bus = kvm->buses[bus_idx];
+	bus = srcu_dereference_const(kvm->buses[bus_idx]);
 	memcpy(new_bus, bus, sizeof(struct kvm_io_bus));
 
 	r = -ENOENT;
@@ -2053,7 +2057,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
 		return r;
 	}
 
-	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
+	srcu_assign_pointer(kvm->buses[bus_idx], new_bus);
 	synchronize_srcu_expedited(&kvm->srcu);
 	kfree(bus);
 	return r;


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH 6/13] bridge: Add core IGMP snooping support
  2010-03-11 18:49                       ` Arnd Bergmann
@ 2010-03-14 23:01                         ` Paul E. McKenney
  0 siblings, 0 replies; 81+ messages in thread
From: Paul E. McKenney @ 2010-03-14 23:01 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Herbert Xu, David S. Miller, netdev, Stephen Hemminger

On Thu, Mar 11, 2010 at 07:49:52PM +0100, Arnd Bergmann wrote:
> Following up on the earlier discussion,
> 
> On Monday 08 March 2010, Arnd Bergmann wrote:
> > > Arnd, would it be reasonable to extend your RCU-sparse changes to have
> > > four different pointer namespaces, one for each flavor of RCU?  (RCU,
> > > RCU-bh, RCU-sched, and SRCU)?  Always a fan of making the computer do
> > > the auditing where reasonable.  ;-)
> > 
> > Yes, I guess that would be possible. I'd still leave out the rculist
> > from any annotations for now, as this would get even more complex then.
> > 
> > One consequence will be the need for new rcu_assign_pointer{,_bh,_sched}
> > macros that check the address space of the first argument, otherwise
> > you'd be able to stick anything in there, including non-__rcu pointers.
> 
> I've tested this out now, see the patch below. I needed to add a number
> of interfaces, but it still seems ok. Doing it for all the rculist
> functions most likely would be less so.
> 
> This is currently the head of my rcu-annotate branch of playground.git.
> Paul, before I split it up and merge this with the per-subsystem patches,
> can you tell me if this is what you had in mind?

This looks extremely nice!!!

I did note a few questions and a couple of minor change below, but the
API and definitions look quite good.

Search for empty lines to find them.  Summary:

o	srcu_assign_pointer() should be defined in include/linux/srcu.h.

o	SRCU_INIT_POINTER() should be defined in include/linux/srcu.h.

o	rcu_dereference_check_sched_domain() can now rely on
	rcu_dereference_sched_check() to do the srcu_read_lock_held()
	check, so no longer needed at this level.

o	kvm_create_vm() should be able to use a single "buses" local
	variable rather than an array of them.

Again, good stuff!!!  Thank you for taking this on!

> > > This could potentially catch the mismatched call_rcu()s, at least if the
> > > rcu_head could be labeled.
> > ...
> > #define rcu_exchange_call(ptr, new, member, func) \
> > ({ \
> >         typeof(new) old = rcu_exchange((ptr),(new)); \
> >         if (old) \
> >                 call_rcu(&(old)->member, (func));       \
> >         old; \
> > })
> 
> Unfortunately, this did not work out at all. Almost every user follows
> a slightly different pattern for call_rcu, so I did not find a way
> to match the call_rcu calls with the pointers. In particular, the functions
> calling call_rcu() sometimes no longer have access to the 'old' data,
> e.g. in case of synchronize_rcu.
> 
> My current take is that static annotations won't help us here.

Thank you for checking it out -- not every idea works out well in
practice, I guess.  ;-)

							Thanx, Paul

> 	Arnd
> 
> ---
> 
> rcu: split up __rcu annotations
>     
> This adds separate name spaces for the four distinct types of RCU
> that we use in the kernel, namely __rcu, __rcu_bh, __rcu_sched and
> __srcu.
>     
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> 
> ---
>  arch/x86/kvm/mmu.c         |    6 ++--
>  arch/x86/kvm/vmx.c         |    2 +-
>  drivers/net/macvlan.c      |    8 ++--
>  include/linux/compiler.h   |    6 ++++
>  include/linux/kvm_host.h   |    4 +-
>  include/linux/netdevice.h  |    2 +-
>  include/linux/rcupdate.h   |   68 +++++++++++++++++++++++++++++++++----------
>  include/linux/srcu.h       |    5 ++-
>  include/net/dst.h          |    4 +-
>  include/net/llc.h          |    3 +-
>  include/net/sock.h         |    2 +-
>  include/trace/events/kvm.h |    4 +-
>  kernel/cgroup.c            |   10 +++---
>  kernel/perf_event.c        |    8 ++--
>  kernel/sched.c             |    6 ++--
>  kernel/sched_fair.c        |    2 +-
>  lib/radix-tree.c           |    8 ++--
>  net/core/filter.c          |    4 +-
>  net/core/sock.c            |    6 ++--
>  net/decnet/dn_route.c      |    2 +-
>  net/ipv4/route.c           |   60 +++++++++++++++++++-------------------
>  net/ipv4/tcp.c             |    4 +-
>  net/llc/llc_core.c         |    6 ++--
>  net/llc/llc_input.c        |    2 +-
>  virt/kvm/iommu.c           |    4 +-
>  virt/kvm/kvm_main.c        |   56 +++++++++++++++++++-----------------
>  26 files changed, 171 insertions(+), 121 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 741373e..45877ca 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -793,7 +793,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  	int retval = 0;
>  	struct kvm_memslots *slots;
> 
> -	slots = rcu_dereference(kvm->memslots);
> +	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
> 
>  	for (i = 0; i < slots->nmemslots; i++) {
>  		struct kvm_memory_slot *memslot = &slots->memslots[i];
> @@ -3007,7 +3007,7 @@ unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm)
>  	unsigned int  nr_pages = 0;
>  	struct kvm_memslots *slots;
> 
> -	slots = rcu_dereference(kvm->memslots);
> +	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
>  	for (i = 0; i < slots->nmemslots; i++)
>  		nr_pages += slots->memslots[i].npages;
> 
> @@ -3282,7 +3282,7 @@ static int count_rmaps(struct kvm_vcpu *vcpu)
>  	int i, j, k, idx;
> 
>  	idx = srcu_read_lock(&kvm->srcu);
> -	slots = rcu_dereference(kvm->memslots);
> +	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
>  	for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
>  		struct kvm_memory_slot *m = &slots->memslots[i];
>  		struct kvm_rmap_desc *d;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0aec1f3..d0c82ed 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1513,7 +1513,7 @@ static gva_t rmode_tss_base(struct kvm *kvm)
>  		struct kvm_memslots *slots;
>  		gfn_t base_gfn;
> 
> -		slots = rcu_dereference(kvm->memslots);
> +		slots = srcu_dereference(kvm->memslots, &kvm->srcu);
>  		base_gfn = slots->memslots[0].base_gfn +
>  				 slots->memslots[0].npages - 3;
>  		return base_gfn << PAGE_SHIFT;
> diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
> index 95e1bcc..b958d5a 100644
> --- a/drivers/net/macvlan.c
> +++ b/drivers/net/macvlan.c
> @@ -531,15 +531,15 @@ static int macvlan_port_create(struct net_device *dev)
>  	INIT_LIST_HEAD(&port->vlans);
>  	for (i = 0; i < MACVLAN_HASH_SIZE; i++)
>  		INIT_HLIST_HEAD(&port->vlan_hash[i]);
> -	rcu_assign_pointer(dev->macvlan_port, port);
> +	rcu_assign_pointer_bh(dev->macvlan_port, port);
>  	return 0;
>  }
> 
>  static void macvlan_port_destroy(struct net_device *dev)
>  {
> -	struct macvlan_port *port = rcu_dereference_const(dev->macvlan_port);
> +	struct macvlan_port *port = rcu_dereference_bh_const(dev->macvlan_port);
> 
> -	rcu_assign_pointer(dev->macvlan_port, NULL);
> +	rcu_assign_pointer_bh(dev->macvlan_port, NULL);
>  	synchronize_rcu();
>  	kfree(port);
>  }
> @@ -624,7 +624,7 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev,
>  		if (err < 0)
>  			return err;
>  	}
> -	port = rcu_dereference(lowerdev->macvlan_port);
> +	port = rcu_dereference_bh(lowerdev->macvlan_port);
> 
>  	vlan->lowerdev = lowerdev;
>  	vlan->dev      = dev;
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 0ab21c2..d5756d4 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -17,6 +17,9 @@
>  # define __cond_lock(x,c)	((c) ? ({ __acquire(x); 1; }) : 0)
>  # define __percpu	__attribute__((noderef, address_space(3)))
>  # define __rcu		__attribute__((noderef, address_space(4)))
> +# define __rcu_bh	__attribute__((noderef, address_space(5)))
> +# define __rcu_sched	__attribute__((noderef, address_space(6)))
> +# define __srcu		__attribute__((noderef, address_space(7)))
>  extern void __chk_user_ptr(const volatile void __user *);
>  extern void __chk_io_ptr(const volatile void __iomem *);
>  #else
> @@ -36,6 +39,9 @@ extern void __chk_io_ptr(const volatile void __iomem *);
>  # define __cond_lock(x,c) (c)
>  # define __percpu
>  # define __rcu
> +# define __rcu_bh
> +# define __rcu_sched
> +# define __srcu
>  #endif
> 
>  #ifdef __KERNEL__
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 9eb0f9c..bad1787 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -164,7 +164,7 @@ struct kvm {
>  	raw_spinlock_t requests_lock;
>  	struct mutex slots_lock;
>  	struct mm_struct *mm; /* userspace tied to this vm */
> -	struct kvm_memslots __rcu *memslots;
> +	struct kvm_memslots __srcu *memslots;
>  	struct srcu_struct srcu;
>  #ifdef CONFIG_KVM_APIC_ARCHITECTURE
>  	u32 bsp_vcpu_id;
> @@ -174,7 +174,7 @@ struct kvm {
>  	atomic_t online_vcpus;
>  	struct list_head vm_list;
>  	struct mutex lock;
> -	struct kvm_io_bus __rcu *buses[KVM_NR_BUSES];
> +	struct kvm_io_bus __srcu *buses[KVM_NR_BUSES];
>  #ifdef CONFIG_HAVE_KVM_EVENTFD
>  	struct {
>  		spinlock_t        lock;
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index fd7e8de..1b72188 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -949,7 +949,7 @@ struct net_device {
>  	/* bridge stuff */
>  	void __rcu		*br_port;
>  	/* macvlan */
> -	struct macvlan_port __rcu *macvlan_port;
> +	struct macvlan_port __rcu_bh *macvlan_port;
>  	/* GARP */
>  	struct garp_port __rcu	*garp_port;
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 03702cc..b4c6f39 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -183,19 +183,33 @@ static inline int rcu_read_lock_sched_held(void)
>   * read-side critical section.  It is also possible to check for
>   * locks being held, for example, by using lockdep_is_held().
>   */
> -#define rcu_dereference_check(p, c) \
> +#define __rcu_dereference_check(p, c, space) \
>  	({ \
>  		if (debug_locks && !(c)) \
>  			lockdep_rcu_dereference(__FILE__, __LINE__); \
> -		rcu_dereference_raw(p); \
> +		__rcu_dereference_raw(p, space); \
>  	})
> 
> +
>  #else /* #ifdef CONFIG_PROVE_RCU */
> 
> -#define rcu_dereference_check(p, c)	rcu_dereference_raw(p)
> +#define __rcu_dereference_check(p, c, space)	\
> +	__rcu_dereference_raw(p, space)
> 
>  #endif /* #else #ifdef CONFIG_PROVE_RCU */
> 
> +#define rcu_dereference_check(p, c) \
> +	__rcu_dereference_check(p, c, __rcu)
> +
> +#define rcu_dereference_bh_check(p, c) \
> +	__rcu_dereference_check(p, rcu_read_lock_bh_held() || (c), __rcu_bh)
> +
> +#define rcu_dereference_sched_check(p, c) \
> +	__rcu_dereference_check(p, rcu_read_lock_sched_held() || (c), __rcu_sched)
> +
> +#define srcu_dereference_check(p, c) \
> +	__rcu_dereference_check(p, srcu_read_lock_held() || (c), __srcu)
> +
>  /**
>   * rcu_read_lock - mark the beginning of an RCU read-side critical section.
>   *
> @@ -341,13 +355,15 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>   * exactly which pointers are protected by RCU and checks that
>   * the pointer is annotated as __rcu.
>   */
> -#define rcu_dereference_raw(p)  ({ \
> +#define __rcu_dereference_raw(p, space)  ({ \
>  				typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
> -				(void) (((typeof (*p) __rcu *)p) == p); \
> +				(void) (((typeof (*p) space *)p) == p); \
>  				smp_read_barrier_depends(); \
>  				((typeof(*p) __force __kernel *)(_________p1)); \
>  				})
> 
> +#define rcu_dereference_raw(p) __rcu_dereference_raw(p, __rcu)
> +
>  /**
>   * rcu_dereference_const - fetch an __rcu pointer outside of a
>   * read-side critical section.
> @@ -360,18 +376,22 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>   * or in an RCU call.
>   */
> 
> -#define rcu_dereference_const(p)     ({ \
> -				(void) (((typeof (*p) __rcu *)p) == p); \
> +#define __rcu_dereference_const(p, space)     ({ \
> +				(void) (((typeof (*p) space *)p) == p); \
>  				((typeof(*p) __force __kernel *)(p)); \
>  				})
> 
> +#define rcu_dereference_const(p)  __rcu_dereference_const(p, __rcu)
> +#define rcu_dereference_bh_const(p)  __rcu_dereference_const(p, __rcu_bh)
> +#define rcu_dereference_sched_const(p)  __rcu_dereference_const(p, __rcu_sched)
> +
>  /**
>   * rcu_dereference - fetch an RCU-protected pointer, checking for RCU
>   *
>   * Makes rcu_dereference_check() do the dirty work.
>   */
>  #define rcu_dereference(p) \
> -	rcu_dereference_check(p, rcu_read_lock_held())
> +	__rcu_dereference_check(p, rcu_read_lock_held(), __rcu)
> 
>  /**
>   * rcu_dereference_bh - fetch an RCU-protected pointer, checking for RCU-bh
> @@ -379,7 +399,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>   * Makes rcu_dereference_check() do the dirty work.
>   */
>  #define rcu_dereference_bh(p) \
> -		rcu_dereference_check(p, rcu_read_lock_bh_held())
> +	__rcu_dereference_check(p, rcu_read_lock_bh_held(), __rcu_bh)
> 
>  /**
>   * rcu_dereference_sched - fetch RCU-protected pointer, checking for RCU-sched
> @@ -387,7 +407,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>   * Makes rcu_dereference_check() do the dirty work.
>   */
>  #define rcu_dereference_sched(p) \
> -		rcu_dereference_check(p, rcu_read_lock_sched_held())
> +	__rcu_dereference_check(p, rcu_read_lock_sched_held(), __rcu_sched)
> 
>  /**
>   * rcu_assign_pointer - assign (publicize) a pointer to a newly
> @@ -402,12 +422,12 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>   * code.
>   */
> 
> -#define rcu_assign_pointer(p, v) \
> +#define __rcu_assign_pointer(p, v, space) \
>  	({ \
>  		if (!__builtin_constant_p(v) || \
>  		    ((v) != NULL)) \
>  			smp_wmb(); \
> -		(p) = (typeof(*v) __force __rcu *)(v); \
> +		(p) = (typeof(*v) __force space *)(v); \
>  	})
> 
>  /**
> @@ -415,10 +435,17 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>   * without barriers.
>   * Using this is almost always a bug.
>   */
> -#define __rcu_assign_pointer(p, v) \
> -	({ \
> -		(p) = (typeof(*v) __force __rcu *)(v); \
> -	})
> +#define rcu_assign_pointer(p, v) \
> +	__rcu_assign_pointer(p, v, __rcu)
> +
> +#define rcu_assign_pointer_bh(p, v) \
> +	__rcu_assign_pointer(p, v, __rcu_bh)
> +
> +#define rcu_assign_pointer_sched(p, v) \
> +	__rcu_assign_pointer(p, v, __rcu_sched)
> +
> +#define srcu_assign_pointer(p, v) \
> +	__rcu_assign_pointer(p, v, __srcu)

For consistency, the definition of srcu_assign_pointer() should go into
include/linux/srcu.h.

>  /**
>   * RCU_INIT_POINTER - initialize an RCU protected member
> @@ -427,6 +454,15 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
>  #define RCU_INIT_POINTER(p, v) \
>  		p = (typeof(*v) __force __rcu *)(v)
> 
> +#define RCU_INIT_POINTER_BH(p, v) \
> +		p = (typeof(*v) __force __rcu_bh *)(v)
> +
> +#define RCU_INIT_POINTER_SCHED(p, v) \
> +		p = (typeof(*v) __force __rcu_sched *)(v)
> +
> +#define SRCU_INIT_POINTER(p, v) \
> +		p = (typeof(*v) __force __srcu *)(v)
> +

For consistency, the definition of SRCU_INIT_POINTER() should go into
include/linux/srcu.h.

>  /* Infrastructure to implement the synchronize_() primitives. */
> 
>  struct rcu_synchronize {
> diff --git a/include/linux/srcu.h b/include/linux/srcu.h
> index 4d5ecb2..feaf661 100644
> --- a/include/linux/srcu.h
> +++ b/include/linux/srcu.h
> @@ -111,7 +111,10 @@ static inline int srcu_read_lock_held(struct srcu_struct *sp)
>   * Makes rcu_dereference_check() do the dirty work.
>   */
>  #define srcu_dereference(p, sp) \
> -		rcu_dereference_check(p, srcu_read_lock_held(sp))
> +		__rcu_dereference_check(p, srcu_read_lock_held(sp), __srcu)
> +
> +#define srcu_dereference_const(p) \
> +		__rcu_dereference_const(p, __srcu)
> 
>  /**
>   * srcu_read_lock - register a new reader for an SRCU-protected structure.
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 5f839aa..bbeaba2 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -94,9 +94,9 @@ struct dst_entry {
>  	unsigned long		lastuse;
>  	union {
>  		struct dst_entry *next;
> -		struct rtable   __rcu *rt_next;
> +		struct rtable   __rcu_bh *rt_next;
>  		struct rt6_info   *rt6_next;
> -		struct dn_route  *dn_next;
> +		struct dn_route  __rcu_bh *dn_next;
>  	};
>  };
> 
> diff --git a/include/net/llc.h b/include/net/llc.h
> index 8299cb2..5700082 100644
> --- a/include/net/llc.h
> +++ b/include/net/llc.h
> @@ -59,7 +59,8 @@ struct llc_sap {
>  	int		 (* rcv_func)(struct sk_buff *skb,
>  				     struct net_device *dev,
>  				     struct packet_type *pt,
> -				     struct net_device *orig_dev) __rcu;
> +				     struct net_device *orig_dev)
> +							 __rcu_bh;
>  	struct llc_addr	 laddr;
>  	struct list_head node;
>  	spinlock_t sk_lock;
> diff --git a/include/net/sock.h b/include/net/sock.h
> index e07cd78..66d5e09 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -290,7 +290,7 @@ struct sock {
>  	struct ucred		sk_peercred;
>  	long			sk_rcvtimeo;
>  	long			sk_sndtimeo;
> -	struct sk_filter __rcu	*sk_filter;
> +	struct sk_filter __rcu_bh *sk_filter;
>  	void			*sk_protinfo;
>  	struct timer_list	sk_timer;
>  	ktime_t			sk_stamp;
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index fc45694..db3e502 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -1392,7 +1392,7 @@ static int cgroup_get_sb(struct file_system_type *fs_type,
>  		root_count++;
> 
>  		sb->s_root->d_fsdata = root_cgrp;
> -		__rcu_assign_pointer(root->top_cgroup.dentry, sb->s_root);
> +		rcu_assign_pointer(root->top_cgroup.dentry, sb->s_root);
> 
>  		/* Link the top cgroup in this hierarchy into all
>  		 * the css_set objects */
> @@ -3243,7 +3243,7 @@ int __init cgroup_init_early(void)
>  	css_set_count = 1;
>  	init_cgroup_root(&rootnode);
>  	root_count = 1;
> -	__rcu_assign_pointer(init_task.cgroups, &init_css_set);
> +	rcu_assign_pointer(init_task.cgroups, &init_css_set);
> 
>  	init_css_set_link.cg = &init_css_set;
>  	init_css_set_link.cgrp = dummytop;
> @@ -3551,7 +3551,7 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
>  	/* Reassign the task to the init_css_set. */
>  	task_lock(tsk);
>  	cg = rcu_dereference_const(tsk->cgroups);
> -	__rcu_assign_pointer(tsk->cgroups, &init_css_set);
> +	rcu_assign_pointer(tsk->cgroups, &init_css_set);
>  	task_unlock(tsk);
>  	if (cg)
>  		put_css_set_taskexit(cg);
> @@ -3959,8 +3959,8 @@ static int __init cgroup_subsys_init_idr(struct cgroup_subsys *ss)
>  		return PTR_ERR(newid);
> 
>  	newid->stack[0] = newid->id;
> -	__rcu_assign_pointer(newid->css, rootcss);
> -	__rcu_assign_pointer(rootcss->id, newid);
> +	rcu_assign_pointer(newid->css, rootcss);
> +	rcu_assign_pointer(rootcss->id, newid);
>  	return 0;
>  }
> 
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index ac8bcbd..e1b65b2 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -1223,8 +1223,8 @@ void perf_event_task_sched_out(struct task_struct *task,
>  			 * XXX do we need a memory barrier of sorts
>  			 * wrt to rcu_dereference() of perf_event_ctxp
>  			 */
> -			__rcu_assign_pointer(task->perf_event_ctxp, next_ctx);
> -			__rcu_assign_pointer(next->perf_event_ctxp, ctx);
> +			rcu_assign_pointer(task->perf_event_ctxp, next_ctx);
> +			rcu_assign_pointer(next->perf_event_ctxp, ctx);
>  			ctx->task = next;
>  			next_ctx->task = task;
>  			do_switch = 0;
> @@ -5376,10 +5376,10 @@ int perf_event_init_task(struct task_struct *child)
>  		 */
>  		cloned_ctx = rcu_dereference(parent_ctx->parent_ctx);
>  		if (cloned_ctx) {
> -			__rcu_assign_pointer(child_ctx->parent_ctx, cloned_ctx);
> +			rcu_assign_pointer(child_ctx->parent_ctx, cloned_ctx);
>  			child_ctx->parent_gen = parent_ctx->parent_gen;
>  		} else {
> -			__rcu_assign_pointer(child_ctx->parent_ctx, parent_ctx);
> +			rcu_assign_pointer(child_ctx->parent_ctx, parent_ctx);
>  			child_ctx->parent_gen = parent_ctx->generation;
>  		}
>  		get_ctx(rcu_dereference_const(child_ctx->parent_ctx));
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 05fd61e..83744d6 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -528,7 +528,7 @@ struct rq {
> 
>  #ifdef CONFIG_SMP
>  	struct root_domain *rd;
> -	struct sched_domain __rcu *sd;
> +	struct sched_domain __rcu_sched *sd;
> 
>  	unsigned char idle_at_tick;
>  	/* For active balancing */
> @@ -603,7 +603,7 @@ static inline int cpu_of(struct rq *rq)
>  }
> 
>  #define rcu_dereference_check_sched_domain(p) \
> -	rcu_dereference_check((p), \
> +	rcu_dereference_sched_check((p), \
>  			      rcu_read_lock_sched_held() || \
>  			      lockdep_is_held(&sched_domains_mutex))

Given your definition, the "rcu_read_lock_sched_held() || \" should now
be able to be deleted, correct?

> 
> @@ -6323,7 +6323,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>  	sched_domain_debug(sd, cpu);
> 
>  	rq_attach_root(rq, rd);
> -	rcu_assign_pointer(rq->sd, sd);
> +	rcu_assign_pointer_sched(rq->sd, sd);
>  }
> 
>  /* cpus with isolated domains */
> diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
> index 3e1fd96..5a5ea2c 100644
> --- a/kernel/sched_fair.c
> +++ b/kernel/sched_fair.c
> @@ -3476,7 +3476,7 @@ static void run_rebalance_domains(struct softirq_action *h)
> 
>  static inline int on_null_domain(int cpu)
>  {
> -	return !rcu_dereference(cpu_rq(cpu)->sd);
> +	return !rcu_dereference_sched(cpu_rq(cpu)->sd);
>  }
> 
>  /*
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index f6ae74c..4c6f149 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -264,7 +264,7 @@ static int radix_tree_extend(struct radix_tree_root *root, unsigned long index)
>  			return -ENOMEM;
> 
>  		/* Increase the height.  */
> -		__rcu_assign_pointer(node->slots[0],
> +		rcu_assign_pointer(node->slots[0],
>  			radix_tree_indirect_to_ptr(rcu_dereference_const(root->rnode)));
> 
>  		/* Propagate the aggregated tag info into the new root */
> @@ -1090,7 +1090,7 @@ static inline void radix_tree_shrink(struct radix_tree_root *root)
>  		newptr = rcu_dereference_const(to_free->slots[0]);
>  		if (root->height > 1)
>  			newptr = radix_tree_ptr_to_indirect(newptr);
> -		__rcu_assign_pointer(root->rnode, newptr);
> +		rcu_assign_pointer(root->rnode, newptr);
>  		root->height--;
>  		radix_tree_node_free(to_free);
>  	}
> @@ -1125,7 +1125,7 @@ void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)
>  	slot = rcu_dereference_const(root->rnode);
>  	if (height == 0) {
>  		root_tag_clear_all(root);
> -		__rcu_assign_pointer(root->rnode, NULL);
> +		rcu_assign_pointer(root->rnode, NULL);
>  		goto out;
>  	}
>  	slot = radix_tree_indirect_to_ptr(slot);
> @@ -1183,7 +1183,7 @@ void *radix_tree_delete(struct radix_tree_root *root, unsigned long index)
>  	}
>  	root_tag_clear_all(root);
>  	root->height = 0;
> -	__rcu_assign_pointer(root->rnode, NULL);
> +	rcu_assign_pointer(root->rnode, NULL);
>  	if (to_free)
>  		radix_tree_node_free(to_free);
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index d38ef7f..b88675b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -522,7 +522,7 @@ int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk)
> 
>  	rcu_read_lock_bh();
>  	old_fp = rcu_dereference_bh(sk->sk_filter);
> -	rcu_assign_pointer(sk->sk_filter, fp);
> +	rcu_assign_pointer_bh(sk->sk_filter, fp);
>  	rcu_read_unlock_bh();
> 
>  	if (old_fp)
> @@ -539,7 +539,7 @@ int sk_detach_filter(struct sock *sk)
>  	rcu_read_lock_bh();
>  	filter = rcu_dereference_bh(sk->sk_filter);
>  	if (filter) {
> -		rcu_assign_pointer(sk->sk_filter, NULL);
> +		rcu_assign_pointer_bh(sk->sk_filter, NULL);
>  		sk_filter_delayed_uncharge(sk, filter);
>  		ret = 0;
>  	}
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 74242e2..8549387 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1073,11 +1073,11 @@ static void __sk_free(struct sock *sk)
>  	if (sk->sk_destruct)
>  		sk->sk_destruct(sk);
> 
> -	filter = rcu_dereference_check(sk->sk_filter,
> +	filter = rcu_dereference_bh_check(sk->sk_filter,
>  				       atomic_read(&sk->sk_wmem_alloc) == 0);
>  	if (filter) {
>  		sk_filter_uncharge(sk, filter);
> -		rcu_assign_pointer(sk->sk_filter, NULL);
> +		rcu_assign_pointer_bh(sk->sk_filter, NULL);
>  	}
> 
>  	sock_disable_timestamp(sk, SOCK_TIMESTAMP);
> @@ -1167,7 +1167,7 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
>  		sock_reset_flag(newsk, SOCK_DONE);
>  		skb_queue_head_init(&newsk->sk_error_queue);
> 
> -		filter = rcu_dereference_const(newsk->sk_filter);
> +		filter = rcu_dereference_bh_const(newsk->sk_filter);
>  		if (filter != NULL)
>  			sk_filter_charge(newsk, filter);
> 
> diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
> index a7bf03c..22ec1d1 100644
> --- a/net/decnet/dn_route.c
> +++ b/net/decnet/dn_route.c
> @@ -92,7 +92,7 @@
> 
>  struct dn_rt_hash_bucket
>  {
> -	struct dn_route *chain;
> +	struct dn_route __rcu_bh *chain;
>  	spinlock_t lock;
>  };
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 37bf0d9..99cef80 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -200,7 +200,7 @@ const __u8 ip_tos2prio[16] = {
>   */
> 
>  struct rt_hash_bucket {
> -	struct rtable __rcu *chain;
> +	struct rtable __rcu_bh *chain;
>  };
> 
>  #if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || \
> @@ -731,26 +731,26 @@ static void rt_do_flush(int process_context)
>  		spin_lock_bh(rt_hash_lock_addr(i));
>  #ifdef CONFIG_NET_NS
>  		{
> -		struct rtable __rcu ** prev;
> +		struct rtable __rcu_bh ** prev;
>  		struct rtable * p;
> 
> -		rth = rcu_dereference_const(rt_hash_table[i].chain);
> +		rth = rcu_dereference_bh(rt_hash_table[i].chain);
> 
>  		/* defer releasing the head of the list after spin_unlock */
> -		for (tail = rth; tail; tail = rcu_dereference_const(tail->u.dst.rt_next))
> +		for (tail = rth; tail; tail = rcu_dereference_bh(tail->u.dst.rt_next))
>  			if (!rt_is_expired(tail))
>  				break;
>  		if (rth != tail)
> -			__rcu_assign_pointer(rt_hash_table[i].chain, tail);
> +			rcu_assign_pointer_bh(rt_hash_table[i].chain, tail);
> 
>  		/* call rt_free on entries after the tail requiring flush */
>  		prev = &rt_hash_table[i].chain;
> -		for (p = rcu_dereference_const(*prev); p; p = next) {
> -			next = rcu_dereference_const(p->u.dst.rt_next);
> +		for (p = rcu_dereference_bh(*prev); p; p = next) {
> +			next = rcu_dereference_bh(p->u.dst.rt_next);
>  			if (!rt_is_expired(p)) {
>  				prev = &p->u.dst.rt_next;
>  			} else {
> -				__rcu_assign_pointer(*prev, next);
> +				rcu_assign_pointer_bh(*prev, next);
>  				rt_free(p);
>  			}
>  		}
> @@ -763,7 +763,7 @@ static void rt_do_flush(int process_context)
>  		spin_unlock_bh(rt_hash_lock_addr(i));
> 
>  		for (; rth != tail; rth = next) {
> -			next = rcu_dereference_const(rth->u.dst.rt_next);
> +			next = rcu_dereference_bh(rth->u.dst.rt_next);
>  			rt_free(rth);
>  		}
>  	}
> @@ -785,7 +785,7 @@ static void rt_check_expire(void)
>  	static unsigned int rover;
>  	unsigned int i = rover, goal;
>  	struct rtable *rth, *aux;
> -	struct rtable __rcu **rthp;
> +	struct rtable __rcu_bh **rthp;
>  	unsigned long samples = 0;
>  	unsigned long sum = 0, sum2 = 0;
>  	unsigned long delta;
> @@ -815,8 +815,8 @@ static void rt_check_expire(void)
>  			continue;
>  		length = 0;
>  		spin_lock_bh(rt_hash_lock_addr(i));
> -		while ((rth = rcu_dereference_const(*rthp)) != NULL) {
> -			prefetch(rcu_dereference_const(rth->u.dst.rt_next));
> +		while ((rth = rcu_dereference_bh(*rthp)) != NULL) {
> +			prefetch(rcu_dereference_bh(rth->u.dst.rt_next));
>  			if (rt_is_expired(rth)) {
>  				*rthp = rth->u.dst.rt_next;
>  				rt_free(rth);
> @@ -836,14 +836,14 @@ nofree:
>  					 * attributes don't unfairly skew
>  					 * the length computation
>  					 */
> -					for (aux = rcu_dereference_const(rt_hash_table[i].chain);;) {
> +					for (aux = rcu_dereference_bh(rt_hash_table[i].chain);;) {
>  						if (aux == rth) {
>  							length += ONE;
>  							break;
>  						}
>  						if (compare_hash_inputs(&aux->fl, &rth->fl))
>  							break;
> -						aux = rcu_dereference_const(aux->u.dst.rt_next);
> +						aux = rcu_dereference_bh(aux->u.dst.rt_next);
>  					}
>  					continue;
>  				}
> @@ -959,7 +959,7 @@ static int rt_garbage_collect(struct dst_ops *ops)
>  	static int rover;
>  	static int equilibrium;
>  	struct rtable *rth;
> -	struct rtable __rcu **rthp;
> +	struct rtable __rcu_bh **rthp;
>  	unsigned long now = jiffies;
>  	int goal;
> 
> @@ -1012,7 +1012,7 @@ static int rt_garbage_collect(struct dst_ops *ops)
>  			k = (k + 1) & rt_hash_mask;
>  			rthp = &rt_hash_table[k].chain;
>  			spin_lock_bh(rt_hash_lock_addr(k));
> -			while ((rth = rcu_dereference_const(*rthp)) != NULL) {
> +			while ((rth = rcu_dereference_bh(*rthp)) != NULL) {
>  				if (!rt_is_expired(rth) &&
>  					!rt_may_expire(rth, tmo, expire)) {
>  					tmo >>= 1;
> @@ -1079,10 +1079,10 @@ static int rt_intern_hash(unsigned hash, struct rtable *rt,
>  			  struct rtable **rp, struct sk_buff *skb)
>  {
>  	struct rtable	*rth;
> -	struct rtable __rcu **rthp;
> +	struct rtable __rcu_bh **rthp;
>  	unsigned long	now;
>  	struct rtable *cand;
> -	struct rtable __rcu **candp;
> +	struct rtable __rcu_bh **candp;
>  	u32 		min_score;
>  	int		chain_length;
>  	int attempts = !in_softirq();
> @@ -1129,7 +1129,7 @@ restart:
>  	rthp = &rt_hash_table[hash].chain;
> 
>  	spin_lock_bh(rt_hash_lock_addr(hash));
> -	while ((rth = rcu_dereference_const(*rthp)) != NULL) {
> +	while ((rth = rcu_dereference_bh(*rthp)) != NULL) {
>  		if (rt_is_expired(rth)) {
>  			*rthp = rth->u.dst.rt_next;
>  			rt_free(rth);
> @@ -1143,13 +1143,13 @@ restart:
>  			 * must be visible to another weakly ordered CPU before
>  			 * the insertion at the start of the hash chain.
>  			 */
> -			rcu_assign_pointer(rth->u.dst.rt_next,
> +			rcu_assign_pointer_bh(rth->u.dst.rt_next,
>  					   rt_hash_table[hash].chain);
>  			/*
>  			 * Since lookup is lockfree, the update writes
>  			 * must be ordered for consistency on SMP.
>  			 */
> -			rcu_assign_pointer(rt_hash_table[hash].chain, rth);
> +			rcu_assign_pointer_bh(rt_hash_table[hash].chain, rth);
> 
>  			dst_use(&rth->u.dst, now);
>  			spin_unlock_bh(rt_hash_lock_addr(hash));
> @@ -1252,7 +1252,7 @@ restart:
>  	 * previous writes to rt are comitted to memory
>  	 * before making rt visible to other CPUS.
>  	 */
> -	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
> +	rcu_assign_pointer_bh(rt_hash_table[hash].chain, rt);
> 
>  	spin_unlock_bh(rt_hash_lock_addr(hash));
> 
> @@ -1325,13 +1325,13 @@ void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst, int more)
> 
>  static void rt_del(unsigned hash, struct rtable *rt)
>  {
> -	struct rtable __rcu **rthp;
> +	struct rtable __rcu_bh **rthp;
>  	struct rtable *aux;
> 
>  	rthp = &rt_hash_table[hash].chain;
>  	spin_lock_bh(rt_hash_lock_addr(hash));
>  	ip_rt_put(rt);
> -	while ((aux = rcu_dereference_const(*rthp)) != NULL) {
> +	while ((aux = rcu_dereference_bh(*rthp)) != NULL) {
>  		if (aux == rt || rt_is_expired(aux)) {
>  			*rthp = aux->u.dst.rt_next;
>  			rt_free(aux);
> @@ -1348,7 +1348,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
>  	int i, k;
>  	struct in_device *in_dev = in_dev_get(dev);
>  	struct rtable *rth;
> -	struct rtable __rcu **rthp;
> +	struct rtable __rcu_bh **rthp;
>  	__be32  skeys[2] = { saddr, 0 };
>  	int  ikeys[2] = { dev->ifindex, 0 };
>  	struct netevent_redirect netevent;
> @@ -1384,7 +1384,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
>  			rthp=&rt_hash_table[hash].chain;
> 
>  			rcu_read_lock();
> -			while ((rth = rcu_dereference(*rthp)) != NULL) {
> +			while ((rth = rcu_dereference_bh(*rthp)) != NULL) {
>  				struct rtable *rt;
> 
>  				if (rth->fl.fl4_dst != daddr ||
> @@ -1646,8 +1646,8 @@ unsigned short ip_rt_frag_needed(struct net *net, struct iphdr *iph,
>  						rt_genid(net));
> 
>  			rcu_read_lock();
> -			for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
> -			     rth = rcu_dereference(rth->u.dst.rt_next)) {
> +			for (rth = rcu_dereference_bh(rt_hash_table[hash].chain); rth;
> +			     rth = rcu_dereference_bh(rth->u.dst.rt_next)) {
>  				unsigned short mtu = new_mtu;
> 
>  				if (rth->fl.fl4_dst != daddr ||
> @@ -2287,8 +2287,8 @@ int ip_route_input(struct sk_buff *skb, __be32 daddr, __be32 saddr,
>  	hash = rt_hash(daddr, saddr, iif, rt_genid(net));
> 
>  	rcu_read_lock();
> -	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
> -	     rth = rcu_dereference(rth->u.dst.rt_next)) {
> +	for (rth = rcu_dereference_bh(rt_hash_table[hash].chain); rth;
> +	     rth = rcu_dereference_bh(rth->u.dst.rt_next)) {
>  		if (((rth->fl.fl4_dst ^ daddr) |
>  		     (rth->fl.fl4_src ^ saddr) |
>  		     (rth->fl.iif ^ iif) |
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index d8ce05b..003d54f 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3252,9 +3252,9 @@ void __init tcp_init(void)
>  	memset(&tcp_secret_two.secrets[0], 0, sizeof(tcp_secret_two.secrets));
>  	tcp_secret_one.expires = jiffy; /* past due */
>  	tcp_secret_two.expires = jiffy; /* past due */
> -	__rcu_assign_pointer(tcp_secret_generating, &tcp_secret_one);
> +	rcu_assign_pointer(tcp_secret_generating, &tcp_secret_one);
>  	tcp_secret_primary = &tcp_secret_one;
> -	__rcu_assign_pointer(tcp_secret_retiring, &tcp_secret_two);
> +	rcu_assign_pointer(tcp_secret_retiring, &tcp_secret_two);
>  	tcp_secret_secondary = &tcp_secret_two;
>  }
> 
> diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c
> index ed7f424..8696677 100644
> --- a/net/llc/llc_core.c
> +++ b/net/llc/llc_core.c
> @@ -50,7 +50,7 @@ static struct llc_sap *__llc_sap_find(unsigned char sap_value)
>  {
>  	struct llc_sap* sap;
> 
> -	list_for_each_entry(sap, &llc_sap_list, node)
> +	list_for_each_entry_rcu(sap, &llc_sap_list, node)
>  		if (sap->laddr.lsap == sap_value)
>  			goto out;
>  	sap = NULL;
> @@ -103,7 +103,7 @@ struct llc_sap *llc_sap_open(unsigned char lsap,
>  	if (!sap)
>  		goto out;
>  	sap->laddr.lsap = lsap;
> -	rcu_assign_pointer(sap->rcv_func, func);
> +	rcu_assign_pointer_bh(sap->rcv_func, func);
>  	list_add_tail_rcu(&sap->node, &llc_sap_list);
>  out:
>  	spin_unlock_bh(&llc_sap_list_lock);
> @@ -127,7 +127,7 @@ void llc_sap_close(struct llc_sap *sap)
>  	list_del_rcu(&sap->node);
>  	spin_unlock_bh(&llc_sap_list_lock);
> 
> -	synchronize_rcu();
> +	synchronize_rcu_bh();
> 
>  	kfree(sap);
>  }
> diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c
> index 57ad974..b775530 100644
> --- a/net/llc/llc_input.c
> +++ b/net/llc/llc_input.c
> @@ -179,7 +179,7 @@ int llc_rcv(struct sk_buff *skb, struct net_device *dev,
>  	 * First the upper layer protocols that don't need the full
>  	 * LLC functionality
>  	 */
> -	rcv = rcu_dereference(sap->rcv_func);
> +	rcv = rcu_dereference_bh(sap->rcv_func);
>  	if (rcv) {
>  		struct sk_buff *cskb = skb_clone(skb, GFP_ATOMIC);
>  		if (cskb)
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index 80fd3ad..2ba7048 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -78,7 +78,7 @@ static int kvm_iommu_map_memslots(struct kvm *kvm)
>  	int i, r = 0;
>  	struct kvm_memslots *slots;
> 
> -	slots = rcu_dereference(kvm->memslots);
> +	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
> 
>  	for (i = 0; i < slots->nmemslots; i++) {
>  		r = kvm_iommu_map_pages(kvm, &slots->memslots[i]);
> @@ -217,7 +217,7 @@ static int kvm_iommu_unmap_memslots(struct kvm *kvm)
>  	int i;
>  	struct kvm_memslots *slots;
> 
> -	slots = rcu_dereference(kvm->memslots);
> +	slots = srcu_dereference(kvm->memslots, &kvm->srcu);
> 
>  	for (i = 0; i < slots->nmemslots; i++) {
>  		kvm_iommu_put_pages(kvm, slots->memslots[i].base_gfn,
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 548f925..ae28c71 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -372,6 +372,8 @@ static struct kvm *kvm_create_vm(void)
>  {
>  	int r = 0, i;
>  	struct kvm *kvm = kvm_arch_create_vm();
> +	struct kvm_io_bus *buses[KVM_NR_BUSES];
> +	struct kvm_memslots *memslots;

This one looks like more that simply an RCU change...

OK, I get it -- you are creating these temporaries in order to avoid
overflowing the line.  Never mind!!!  ;-)

>  	if (IS_ERR(kvm))
>  		goto out;
> @@ -386,14 +388,15 @@ static struct kvm *kvm_create_vm(void)
>  #endif
> 
>  	r = -ENOMEM;
> -	kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> +	memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
> +	srcu_assign_pointer(kvm->memslots, memslots);
>  	if (!kvm->memslots)
>  		goto out_err;
>  	if (init_srcu_struct(&kvm->srcu))
>  		goto out_err;
>  	for (i = 0; i < KVM_NR_BUSES; i++) {
> -		kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus),
> -					GFP_KERNEL);
> +		buses[i] = kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL);
> +		srcu_assign_pointer(kvm->buses[i], buses[i]);

But why do you need an array for "buses" instead of only one variable?

>  		if (!kvm->buses[i]) {
>  			cleanup_srcu_struct(&kvm->srcu);
>  			goto out_err;
> @@ -428,8 +431,8 @@ out_err:
>  	hardware_disable_all();
>  out_err_nodisable:
>  	for (i = 0; i < KVM_NR_BUSES; i++)
> -		kfree(kvm->buses[i]);
> -	kfree(kvm->memslots);
> +		kfree(buses[i]);

OK, I see what you are trying to do.  But why not free all the non-NULL
ones from the kvm-> structure, and then use a single "buses" rather than
an array of them?  Perhaps running "i" down from whereever the earlier
loop left it, in case it is difficult to zero the underlying kvm->
structure?

Just trying to save a bit of stack space...

> +	kfree(memslots);
>  	kfree(kvm);
>  	return ERR_PTR(r);
>  }
> @@ -464,12 +467,12 @@ static void kvm_free_physmem_slot(struct kvm_memory_slot *free,
>  void kvm_free_physmem(struct kvm *kvm)
>  {
>  	int i;
> -	struct kvm_memslots *slots = kvm->memslots;
> +	struct kvm_memslots *slots = srcu_dereference_const(kvm->memslots);
> 
>  	for (i = 0; i < slots->nmemslots; ++i)
>  		kvm_free_physmem_slot(&slots->memslots[i], NULL);
> 
> -	kfree(kvm->memslots);
> +	kfree(slots);
>  }
> 
>  static void kvm_destroy_vm(struct kvm *kvm)
> @@ -483,7 +486,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
>  	spin_unlock(&kvm_lock);
>  	kvm_free_irq_routing(kvm);
>  	for (i = 0; i < KVM_NR_BUSES; i++)
> -		kvm_io_bus_destroy(kvm->buses[i]);
> +		kvm_io_bus_destroy(srcu_dereference_const(kvm->buses[i]));
>  	kvm_coalesced_mmio_free(kvm);
>  #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
>  	mmu_notifier_unregister(&kvm->mmu_notifier, kvm->mm);
> @@ -552,7 +555,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  	if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
>  		goto out;
> 
> -	memslot = &kvm->memslots->memslots[mem->slot];
> +	old_memslots = srcu_dereference(kvm->memslots, &kvm->srcu);
> +	memslot = &old_memslots->memslots[mem->slot];
>  	base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
>  	npages = mem->memory_size >> PAGE_SHIFT;
> 
> @@ -573,7 +577,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
>  	/* Check for overlaps */
>  	r = -EEXIST;
>  	for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
> -		struct kvm_memory_slot *s = &kvm->memslots->memslots[i];
> +		struct kvm_memory_slot *s = &old_memslots->memslots[i];
> 
>  		if (s == memslot || !s->npages)
>  			continue;
> @@ -669,13 +673,13 @@ skip_lpage:
>  		slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
>  		if (!slots)
>  			goto out_free;
> -		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> +		old_memslots = srcu_dereference_const(kvm->memslots);
> +		memcpy(slots, old_memslots, sizeof(struct kvm_memslots));
>  		if (mem->slot >= slots->nmemslots)
>  			slots->nmemslots = mem->slot + 1;
>  		slots->memslots[mem->slot].flags |= KVM_MEMSLOT_INVALID;
> 
> -		old_memslots = kvm->memslots;
> -		rcu_assign_pointer(kvm->memslots, slots);
> +		srcu_assign_pointer(kvm->memslots, slots);
>  		synchronize_srcu_expedited(&kvm->srcu);
>  		/* From this point no new shadow pages pointing to a deleted
>  		 * memslot will be created.
> @@ -705,7 +709,8 @@ skip_lpage:
>  	slots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
>  	if (!slots)
>  		goto out_free;
> -	memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> +	old_memslots = srcu_dereference_const(kvm->memslots);
> +	memcpy(slots, old_memslots, sizeof(struct kvm_memslots));
>  	if (mem->slot >= slots->nmemslots)
>  		slots->nmemslots = mem->slot + 1;
> 
> @@ -718,8 +723,7 @@ skip_lpage:
>  	}
> 
>  	slots->memslots[mem->slot] = new;
> -	old_memslots = kvm->memslots;
> -	rcu_assign_pointer(kvm->memslots, slots);
> +	srcu_assign_pointer(kvm->memslots, slots);
>  	synchronize_srcu_expedited(&kvm->srcu);
> 
>  	kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
> @@ -775,7 +779,7 @@ int kvm_get_dirty_log(struct kvm *kvm,
>  	if (log->slot >= KVM_MEMORY_SLOTS)
>  		goto out;
> 
> -	memslot = &kvm->memslots->memslots[log->slot];
> +	memslot = &srcu_dereference(kvm->memslots, &kvm->srcu)->memslots[log->slot];
>  	r = -ENOENT;
>  	if (!memslot->dirty_bitmap)
>  		goto out;
> @@ -829,7 +833,7 @@ EXPORT_SYMBOL_GPL(kvm_is_error_hva);
>  struct kvm_memory_slot *gfn_to_memslot_unaliased(struct kvm *kvm, gfn_t gfn)
>  {
>  	int i;
> -	struct kvm_memslots *slots = rcu_dereference(kvm->memslots);
> +	struct kvm_memslots *slots = srcu_dereference(kvm->memslots, &kvm->srcu);
> 
>  	for (i = 0; i < slots->nmemslots; ++i) {
>  		struct kvm_memory_slot *memslot = &slots->memslots[i];
> @@ -851,7 +855,7 @@ struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn)
>  int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
>  {
>  	int i;
> -	struct kvm_memslots *slots = rcu_dereference(kvm->memslots);
> +	struct kvm_memslots *slots = srcu_dereference(kvm->memslots, &kvm->srcu);
> 
>  	gfn = unalias_gfn_instantiation(kvm, gfn);
>  	for (i = 0; i < KVM_MEMORY_SLOTS; ++i) {
> @@ -895,7 +899,7 @@ out:
>  int memslot_id(struct kvm *kvm, gfn_t gfn)
>  {
>  	int i;
> -	struct kvm_memslots *slots = rcu_dereference(kvm->memslots);
> +	struct kvm_memslots *slots = srcu_dereference(kvm->memslots, &kvm->srcu);
>  	struct kvm_memory_slot *memslot = NULL;
> 
>  	gfn = unalias_gfn(kvm, gfn);
> @@ -1984,7 +1988,7 @@ int kvm_io_bus_write(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
>  		     int len, const void *val)
>  {
>  	int i;
> -	struct kvm_io_bus *bus = rcu_dereference(kvm->buses[bus_idx]);
> +	struct kvm_io_bus *bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
>  	for (i = 0; i < bus->dev_count; i++)
>  		if (!kvm_iodevice_write(bus->devs[i], addr, len, val))
>  			return 0;
> @@ -1996,7 +2000,7 @@ int kvm_io_bus_read(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr,
>  		    int len, void *val)
>  {
>  	int i;
> -	struct kvm_io_bus *bus = rcu_dereference(kvm->buses[bus_idx]);
> +	struct kvm_io_bus *bus = srcu_dereference(kvm->buses[bus_idx], &kvm->srcu);
> 
>  	for (i = 0; i < bus->dev_count; i++)
>  		if (!kvm_iodevice_read(bus->devs[i], addr, len, val))
> @@ -2010,7 +2014,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>  {
>  	struct kvm_io_bus *new_bus, *bus;
> 
> -	bus = kvm->buses[bus_idx];
> +	bus = srcu_dereference_const(kvm->buses[bus_idx]);
>  	if (bus->dev_count > NR_IOBUS_DEVS-1)
>  		return -ENOSPC;
> 
> @@ -2019,7 +2023,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>  		return -ENOMEM;
>  	memcpy(new_bus, bus, sizeof(struct kvm_io_bus));
>  	new_bus->devs[new_bus->dev_count++] = dev;
> -	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
> +	srcu_assign_pointer(kvm->buses[bus_idx], new_bus);
>  	synchronize_srcu_expedited(&kvm->srcu);
>  	kfree(bus);
> 
> @@ -2037,7 +2041,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>  	if (!new_bus)
>  		return -ENOMEM;
> 
> -	bus = kvm->buses[bus_idx];
> +	bus = srcu_dereference_const(kvm->buses[bus_idx]);
>  	memcpy(new_bus, bus, sizeof(struct kvm_io_bus));
> 
>  	r = -ENOENT;
> @@ -2053,7 +2057,7 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx,
>  		return r;
>  	}
> 
> -	rcu_assign_pointer(kvm->buses[bus_idx], new_bus);
> +	srcu_assign_pointer(kvm->buses[bus_idx], new_bus);
>  	synchronize_srcu_expedited(&kvm->srcu);
>  	kfree(bus);
>  	return r;
> 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH net-next] bridge: use is_multicast_ether_addr
  2010-02-28  5:41   ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
@ 2010-04-27 17:13     ` Stephen Hemminger
  2010-04-27 19:53       ` David Miller
  0 siblings, 1 reply; 81+ messages in thread
From: Stephen Hemminger @ 2010-04-27 17:13 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller; +Cc: netdev

Use existing inline function.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/net/bridge/br_device.c	2010-04-27 09:49:30.059258391 -0700
+++ b/net/bridge/br_device.c	2010-04-27 09:50:21.439878721 -0700
@@ -36,7 +36,7 @@ netdev_tx_t br_dev_xmit(struct sk_buff *
 	skb_reset_mac_header(skb);
 	skb_pull(skb, ETH_HLEN);
 
-	if (dest[0] & 1) {
+	if (is_multicast_ether_addr(dest)) {
 		if (br_multicast_rcv(br, NULL, skb))
 			goto out;
 

^ permalink raw reply	[flat|nested] 81+ messages in thread

* [PATCH net-next] bridge: multicast router list manipulation
  2010-02-28  5:41   ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
@ 2010-04-27 17:13     ` Stephen Hemminger
  2010-04-27 19:54       ` David Miller
                         ` (2 more replies)
  0 siblings, 3 replies; 81+ messages in thread
From: Stephen Hemminger @ 2010-04-27 17:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, netdev

I prefer that the hlist be only accessed through the hlist macro
objects. Explicit twiddling of links (especially with RCU) exposes
the code to future bugs.

Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>


--- a/net/bridge/br_multicast.c	2010-04-27 09:54:02.180531924 -0700
+++ b/net/bridge/br_multicast.c	2010-04-27 10:07:19.188688664 -0700
@@ -1041,21 +1041,21 @@ static int br_ip6_multicast_mld2_report(
 static void br_multicast_add_router(struct net_bridge *br,
 				    struct net_bridge_port *port)
 {
-	struct hlist_node *p;
-	struct hlist_node **h;
+	struct net_bridge_port *p;
+	struct hlist_node *n, *last = NULL;
 
-	for (h = &br->router_list.first;
-	     (p = *h) &&
-	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
-	     (unsigned long)port;
-	     h = &p->next)
-		;
-
-	port->rlist.pprev = h;
-	port->rlist.next = p;
-	rcu_assign_pointer(*h, &port->rlist);
-	if (p)
-		p->pprev = &port->rlist.next;
+	hlist_for_each_entry(p, n, &br->router_list, rlist) {
+		if ((unsigned long) port >= (unsigned long) p) {
+			hlist_add_before_rcu(n, &port->rlist);
+			return;
+		}
+		last = n;
+	}
+
+	if (last)
+		hlist_add_after_rcu(last, &port->rlist);
+	else
+		hlist_add_head_rcu(&port->rlist, &br->router_list);
 }
 
 static void br_multicast_mark_router(struct net_bridge *br,


^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: use is_multicast_ether_addr
  2010-04-27 17:13     ` [PATCH net-next] bridge: use is_multicast_ether_addr Stephen Hemminger
@ 2010-04-27 19:53       ` David Miller
  0 siblings, 0 replies; 81+ messages in thread
From: David Miller @ 2010-04-27 19:53 UTC (permalink / raw)
  To: shemminger; +Cc: herbert, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 27 Apr 2010 10:13:06 -0700

> Use existing inline function.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: multicast router list manipulation
  2010-04-27 17:13     ` [PATCH net-next] bridge: multicast router list manipulation Stephen Hemminger
@ 2010-04-27 19:54       ` David Miller
  2010-04-27 23:11       ` Michał Mirosław
  2010-04-28  1:51       ` Herbert Xu
  2 siblings, 0 replies; 81+ messages in thread
From: David Miller @ 2010-04-27 19:54 UTC (permalink / raw)
  To: shemminger; +Cc: herbert, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 27 Apr 2010 10:13:11 -0700

> I prefer that the hlist be only accessed through the hlist macro
> objects. Explicit twiddling of links (especially with RCU) exposes
> the code to future bugs.
> 
> Compile tested only.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Yes this by-hand stuff was beyond awful, I'm sorry I didn't catch it
when pulling in these changes initially :-)

Applied, thanks Stephen.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: multicast router list manipulation
  2010-04-27 17:13     ` [PATCH net-next] bridge: multicast router list manipulation Stephen Hemminger
  2010-04-27 19:54       ` David Miller
@ 2010-04-27 23:11       ` Michał Mirosław
  2010-04-27 23:25         ` Stephen Hemminger
  2010-04-27 23:27         ` David Miller
  2010-04-28  1:51       ` Herbert Xu
  2 siblings, 2 replies; 81+ messages in thread
From: Michał Mirosław @ 2010-04-27 23:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Herbert Xu, David S. Miller, netdev

2010/4/27 Stephen Hemminger <shemminger@vyatta.com>:
> I prefer that the hlist be only accessed through the hlist macro
> objects. Explicit twiddling of links (especially with RCU) exposes
> the code to future bugs.
[...]
> -       port->rlist.pprev = h;
> -       port->rlist.next = p;
> -       rcu_assign_pointer(*h, &port->rlist);
> -       if (p)
> -               p->pprev = &port->rlist.next;
> +       hlist_for_each_entry(p, n, &br->router_list, rlist) {

Shouldn't this be hlist_for_each_entry_rcu?

> +               if ((unsigned long) port >= (unsigned long) p) {
> +                       hlist_add_before_rcu(n, &port->rlist);
> +                       return;
> +               }
> +               last = n;
> +       }
> +
> +       if (last)
> +               hlist_add_after_rcu(last, &port->rlist);
> +       else
> +               hlist_add_head_rcu(&port->rlist, &br->router_list);
>  }
>
[...]

Best Regards,
Michał Mirosław

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: multicast router list manipulation
  2010-04-27 23:11       ` Michał Mirosław
@ 2010-04-27 23:25         ` Stephen Hemminger
  2010-04-27 23:28           ` David Miller
  2010-04-27 23:27         ` David Miller
  1 sibling, 1 reply; 81+ messages in thread
From: Stephen Hemminger @ 2010-04-27 23:25 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: Herbert Xu, David S. Miller, netdev

On Wed, 28 Apr 2010 01:11:52 +0200
Michał Mirosław <mirqus@gmail.com> wrote:

> 2010/4/27 Stephen Hemminger <shemminger@vyatta.com>:
> > I prefer that the hlist be only accessed through the hlist macro
> > objects. Explicit twiddling of links (especially with RCU) exposes
> > the code to future bugs.
> [...]
> > -       port->rlist.pprev = h;
> > -       port->rlist.next = p;
> > -       rcu_assign_pointer(*h, &port->rlist);
> > -       if (p)
> > -               p->pprev = &port->rlist.next;
> > +       hlist_for_each_entry(p, n, &br->router_list, rlist) {
> 
> Shouldn't this be hlist_for_each_entry_rcu?
> 

This code should already be protected by br->multicast_lock

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: multicast router list manipulation
  2010-04-27 23:11       ` Michał Mirosław
  2010-04-27 23:25         ` Stephen Hemminger
@ 2010-04-27 23:27         ` David Miller
  1 sibling, 0 replies; 81+ messages in thread
From: David Miller @ 2010-04-27 23:27 UTC (permalink / raw)
  To: mirqus; +Cc: shemminger, herbert, netdev

From: Michał Mirosław <mirqus@gmail.com>
Date: Wed, 28 Apr 2010 01:11:52 +0200

> 2010/4/27 Stephen Hemminger <shemminger@vyatta.com>:
>> I prefer that the hlist be only accessed through the hlist macro
>> objects. Explicit twiddling of links (especially with RCU) exposes
>> the code to future bugs.
> [...]
>> -       port->rlist.pprev = h;
>> -       port->rlist.next = p;
>> -       rcu_assign_pointer(*h, &port->rlist);
>> -       if (p)
>> -               p->pprev = &port->rlist.next;
>> +       hlist_for_each_entry(p, n, &br->router_list, rlist) {
> 
> Shouldn't this be hlist_for_each_entry_rcu?

I think so, I've committed the following to net-next-2.6 to fix this,
thanks!

bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router()

Noticed by Michał Mirosław.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/bridge/br_multicast.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index fcba313..e29c9b7 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -1045,7 +1045,7 @@ static void br_multicast_add_router(struct net_bridge *br,
 	struct net_bridge_port *p;
 	struct hlist_node *n, *last = NULL;
 
-	hlist_for_each_entry(p, n, &br->router_list, rlist) {
+	hlist_for_each_entry_rcu(p, n, &br->router_list, rlist) {
 		if ((unsigned long) port >= (unsigned long) p) {
 			hlist_add_before_rcu(n, &port->rlist);
 			return;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: multicast router list manipulation
  2010-04-27 23:25         ` Stephen Hemminger
@ 2010-04-27 23:28           ` David Miller
  2010-04-27 23:44             ` Stephen Hemminger
  0 siblings, 1 reply; 81+ messages in thread
From: David Miller @ 2010-04-27 23:28 UTC (permalink / raw)
  To: shemminger; +Cc: mirqus, herbert, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 27 Apr 2010 16:25:30 -0700

> On Wed, 28 Apr 2010 01:11:52 +0200
> Michał Mirosław <mirqus@gmail.com> wrote:
> 
>> 2010/4/27 Stephen Hemminger <shemminger@vyatta.com>:
>> > I prefer that the hlist be only accessed through the hlist macro
>> > objects. Explicit twiddling of links (especially with RCU) exposes
>> > the code to future bugs.
>> [...]
>> > -       port->rlist.pprev = h;
>> > -       port->rlist.next = p;
>> > -       rcu_assign_pointer(*h, &port->rlist);
>> > -       if (p)
>> > -               p->pprev = &port->rlist.next;
>> > +       hlist_for_each_entry(p, n, &br->router_list, rlist) {
>> 
>> Shouldn't this be hlist_for_each_entry_rcu?
>> 
> 
> This code should already be protected by br->multicast_lock

But the adds et al. use RCU already, I think we should be consistent
one way or another.

I've already made Michał's suggested change to net-next-2.6, if you
think _rcu() isn't necessary then trim it from all the hlist calls
in this function.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: multicast router list manipulation
  2010-04-27 23:28           ` David Miller
@ 2010-04-27 23:44             ` Stephen Hemminger
  2010-04-27 23:51               ` David Miller
  0 siblings, 1 reply; 81+ messages in thread
From: Stephen Hemminger @ 2010-04-27 23:44 UTC (permalink / raw)
  To: David Miller; +Cc: mirqus, herbert, netdev

On Tue, 27 Apr 2010 16:28:11 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Stephen Hemminger <shemminger@vyatta.com>
> Date: Tue, 27 Apr 2010 16:25:30 -0700
> 
> > On Wed, 28 Apr 2010 01:11:52 +0200
> > Michał Mirosław <mirqus@gmail.com> wrote:
> > 
> >> 2010/4/27 Stephen Hemminger <shemminger@vyatta.com>:
> >> > I prefer that the hlist be only accessed through the hlist macro
> >> > objects. Explicit twiddling of links (especially with RCU) exposes
> >> > the code to future bugs.
> >> [...]
> >> > -       port->rlist.pprev = h;
> >> > -       port->rlist.next = p;
> >> > -       rcu_assign_pointer(*h, &port->rlist);
> >> > -       if (p)
> >> > -               p->pprev = &port->rlist.next;
> >> > +       hlist_for_each_entry(p, n, &br->router_list, rlist) {
> >> 
> >> Shouldn't this be hlist_for_each_entry_rcu?
> >> 
> > 
> > This code should already be protected by br->multicast_lock
> 
> But the adds et al. use RCU already, I think we should be consistent
> one way or another.
> 
> I've already made Michał's suggested change to net-next-2.6, if you
> think _rcu() isn't necessary then trim it from all the hlist calls
> in this function.

Code that doesn't need rcu for traversal should not use it.
It just confuses things and implies that rcu_read_lock is held
which it is not in this code.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: multicast router list manipulation
  2010-04-27 23:44             ` Stephen Hemminger
@ 2010-04-27 23:51               ` David Miller
  0 siblings, 0 replies; 81+ messages in thread
From: David Miller @ 2010-04-27 23:51 UTC (permalink / raw)
  To: shemminger; +Cc: mirqus, herbert, netdev

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Tue, 27 Apr 2010 16:44:12 -0700

> Code that doesn't need rcu for traversal should not use it.
> It just confuses things and implies that rcu_read_lock is held
> which it is not in this code.

Fair enough, I've reverted by change.

This would have been helped if either the patch was split up into
two pieces (one straight hlist macro conversion, another removing
the RCU tag) _or_ the commit message explained why the RCU tag
could be elided.

^ permalink raw reply	[flat|nested] 81+ messages in thread

* Re: [PATCH net-next] bridge: multicast router list manipulation
  2010-04-27 17:13     ` [PATCH net-next] bridge: multicast router list manipulation Stephen Hemminger
  2010-04-27 19:54       ` David Miller
  2010-04-27 23:11       ` Michał Mirosław
@ 2010-04-28  1:51       ` Herbert Xu
  2 siblings, 0 replies; 81+ messages in thread
From: Herbert Xu @ 2010-04-28  1:51 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev

On Tue, Apr 27, 2010 at 10:13:11AM -0700, Stephen Hemminger wrote:
> I prefer that the hlist be only accessed through the hlist macro
> objects. Explicit twiddling of links (especially with RCU) exposes
> the code to future bugs.
> 
> Compile tested only.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> 
> --- a/net/bridge/br_multicast.c	2010-04-27 09:54:02.180531924 -0700
> +++ b/net/bridge/br_multicast.c	2010-04-27 10:07:19.188688664 -0700
> @@ -1041,21 +1041,21 @@ static int br_ip6_multicast_mld2_report(
>  static void br_multicast_add_router(struct net_bridge *br,
>  				    struct net_bridge_port *port)
>  {
> -	struct hlist_node *p;
> -	struct hlist_node **h;
> +	struct net_bridge_port *p;
> +	struct hlist_node *n, *last = NULL;
>  
> -	for (h = &br->router_list.first;
> -	     (p = *h) &&
> -	     (unsigned long)container_of(p, struct net_bridge_port, rlist) >
> -	     (unsigned long)port;
> -	     h = &p->next)
> -		;
> -
> -	port->rlist.pprev = h;
> -	port->rlist.next = p;
> -	rcu_assign_pointer(*h, &port->rlist);
> -	if (p)
> -		p->pprev = &port->rlist.next;
> +	hlist_for_each_entry(p, n, &br->router_list, rlist) {
> +		if ((unsigned long) port >= (unsigned long) p) {
> +			hlist_add_before_rcu(n, &port->rlist);
> +			return;
> +		}
> +		last = n;
> +	}
> +
> +	if (last)
> +		hlist_add_after_rcu(last, &port->rlist);
> +	else
> +		hlist_add_head_rcu(&port->rlist, &br->router_list);
>  }

Thanks Stephen, this looks good to me too.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 81+ messages in thread

end of thread, other threads:[~2010-04-28  1:51 UTC | newest]

Thread overview: 81+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-26 15:34 [RFC] [1/13] bridge: Add IGMP snooping support Herbert Xu
2010-02-26 15:35 ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
2010-02-26 15:35 ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
2010-02-27 11:14   ` David Miller
2010-02-27 15:36     ` Herbert Xu
2010-02-26 15:35 ` [PATCH 3/13] bridge: Avoid unnecessary clone on forward path Herbert Xu
2010-02-26 15:35 ` [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path Herbert Xu
2010-02-26 15:35 ` [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood Herbert Xu
2010-02-26 15:35 ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
2010-02-26 15:35 ` [PATCH 7/13] bridge: Add multicast forwarding functions Herbert Xu
2010-02-26 15:35 ` [PATCH 8/13] bridge: Add multicast start/stop hooks Herbert Xu
2010-02-26 15:35 ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
2010-02-26 15:35 ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
2010-02-27  0:42   ` Stephen Hemminger
2010-02-27 11:29     ` David Miller
2010-02-27 15:53       ` Herbert Xu
2010-03-09 12:25     ` Herbert Xu
2010-03-09 12:26       ` Herbert Xu
2010-02-26 15:35 ` [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle Herbert Xu
2010-02-26 15:35 ` [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries Herbert Xu
2010-02-26 15:35 ` [PATCH 13/13] bridge: Add multicast count/interval " Herbert Xu
2010-02-28  5:40 ` [1/13] bridge: Add IGMP snooping support Herbert Xu
2010-02-28  5:41   ` [PATCH 1/13] bridge: Do br_pass_frame_up after other ports Herbert Xu
2010-02-28  5:41   ` [PATCH 2/13] bridge: Allow tail-call on br_pass_frame_up Herbert Xu
2010-02-28  5:41   ` [PATCH 3/13] bridge: Avoid unnecessary clone on forward path Herbert Xu
2010-02-28  5:41   ` [PATCH 4/13] bridge: Use BR_INPUT_SKB_CB on xmit path Herbert Xu
2010-02-28  5:41   ` [PATCH 5/13] bridge: Split may_deliver/deliver_clone out of br_flood Herbert Xu
2010-02-28  5:41   ` [PATCH 6/13] bridge: Add core IGMP snooping support Herbert Xu
2010-03-05 23:43     ` Paul E. McKenney
2010-03-06  1:17       ` Herbert Xu
2010-03-06  5:06         ` Paul E. McKenney
2010-03-06  6:56           ` Herbert Xu
2010-03-06  7:03             ` Herbert Xu
2010-03-07 23:31               ` David Miller
2010-03-06  7:07             ` Herbert Xu
2010-03-07 23:31               ` David Miller
2010-03-06 15:00             ` Paul E. McKenney
2010-03-06 15:19             ` Paul E. McKenney
2010-03-06 19:00               ` Paul E. McKenney
2010-03-07  2:45                 ` Herbert Xu
2010-03-07  3:11                   ` Paul E. McKenney
2010-03-08 18:50                     ` Arnd Bergmann
2010-03-09  3:15                       ` Paul E. McKenney
2010-03-11 18:49                       ` Arnd Bergmann
2010-03-14 23:01                         ` Paul E. McKenney
2010-03-09 21:12                     ` Arnd Bergmann
2010-03-10  2:14                       ` Paul E. McKenney
2010-03-10  9:41                         ` Arnd Bergmann
2010-03-10 10:39                           ` Eric Dumazet
2010-03-10 10:49                             ` Herbert Xu
2010-03-10 13:13                               ` Paul E. McKenney
2010-03-10 14:07                                 ` Herbert Xu
2010-03-10 16:26                                   ` Paul E. McKenney
2010-03-10 16:35                                     ` David Miller
2010-03-10 17:56                                       ` Arnd Bergmann
2010-03-10 21:25                                         ` Paul E. McKenney
2010-03-10 13:27                               ` Arnd Bergmann
2010-03-10 13:39                               ` Arnd Bergmann
2010-03-10 13:19                           ` Paul E. McKenney
2010-03-10 13:30                             ` Arnd Bergmann
2010-03-10 13:57                               ` Paul E. McKenney
2010-02-28  5:41   ` [PATCH 7/13] bridge: Add multicast forwarding functions Herbert Xu
2010-02-28  5:41   ` [PATCH 8/13] bridge: Add multicast start/stop hooks Herbert Xu
2010-02-28  5:41   ` [PATCH 9/13] bridge: Add multicast data-path hooks Herbert Xu
2010-04-27 17:13     ` [PATCH net-next] bridge: use is_multicast_ether_addr Stephen Hemminger
2010-04-27 19:53       ` David Miller
2010-02-28  5:41   ` [PATCH 10/13] bridge: Add multicast_router sysfs entries Herbert Xu
2010-04-27 17:13     ` [PATCH net-next] bridge: multicast router list manipulation Stephen Hemminger
2010-04-27 19:54       ` David Miller
2010-04-27 23:11       ` Michał Mirosław
2010-04-27 23:25         ` Stephen Hemminger
2010-04-27 23:28           ` David Miller
2010-04-27 23:44             ` Stephen Hemminger
2010-04-27 23:51               ` David Miller
2010-04-27 23:27         ` David Miller
2010-04-28  1:51       ` Herbert Xu
2010-02-28  5:41   ` [PATCH 11/13] bridge: Add multicast_snooping sysfs toggle Herbert Xu
2010-02-28  5:41   ` [PATCH 12/13] bridge: Add hash elasticity/max sysfs entries Herbert Xu
2010-02-28  5:41   ` [PATCH 13/13] bridge: Add multicast count/interval " Herbert Xu
2010-02-28  8:52   ` [1/13] bridge: Add IGMP snooping support David Miller
2010-03-01  2:08     ` Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.