All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries
@ 2013-06-05  4:24 Stephen Hemminger
  2013-06-05  4:24 ` [PATCH net-next 02/10] vxlan: handle skb_clone failure Stephen Hemminger
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

Only migrate dynamic forwarding table entries, don't modify
static entries. If packet received from incorrect source IP address
assume it is an imposter and drop it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

---
Should go to -stable as well.
---
 drivers/net/vxlan.c |   17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 8111565..536082a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -604,8 +604,8 @@ skip:
 /* Watch incoming packets to learn mapping between Ethernet address
  * and Tunnel endpoint.
  */
-static void vxlan_snoop(struct net_device *dev,
-			__be32 src_ip, const u8 *src_mac)
+static int vxlan_snoop(struct net_device *dev,
+		       __be32 src_ip, const u8 *src_mac)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_fdb *f;
@@ -614,7 +614,11 @@ static void vxlan_snoop(struct net_device *dev,
 	f = vxlan_find_mac(vxlan, src_mac);
 	if (likely(f)) {
 		if (likely(f->remote.remote_ip == src_ip))
-			return;
+			return 0;
+
+		/* Don't migrate static entries, drop packets */
+		if (!(f->flags & NTF_SELF))
+			return 1;
 
 		if (net_ratelimit())
 			netdev_info(dev,
@@ -634,6 +638,8 @@ static void vxlan_snoop(struct net_device *dev,
 				       0, NTF_SELF);
 		spin_unlock(&vxlan->hash_lock);
 	}
+
+	return 0;
 }
 
 
@@ -766,8 +772,9 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 			       vxlan->dev->dev_addr) == 0)
 		goto drop;
 
-	if (vxlan->flags & VXLAN_F_LEARN)
-		vxlan_snoop(skb->dev, oip->saddr, eth_hdr(skb)->h_source);
+	if ((vxlan->flags & VXLAN_F_LEARN) &&
+	    vxlan_snoop(skb->dev, oip->saddr, eth_hdr(skb)->h_source))
+		goto drop;
 
 	__skb_tunnel_rx(skb, vxlan->dev);
 	skb_reset_network_header(skb);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 02/10] vxlan: handle skb_clone failure
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05  6:59   ` Cong Wang
  2013-06-05 12:50   ` David Stevens
  2013-06-05  4:24 ` [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue Stephen Hemminger
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

skb_clone can fail if out of memory. Just skip the fanout.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

---
Does not impact stable, this was introduced by the multiple
destination code.
---
 drivers/net/vxlan.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 536082a..9085c81 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1197,9 +1197,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 		struct sk_buff *skb1;
 
 		skb1 = skb_clone(skb, GFP_ATOMIC);
-		rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
-		if (rc == NETDEV_TX_OK)
-			rc = rc1;
+		if (skb1) {
+			rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
+			if (rc == NETDEV_TX_OK)
+				rc = rc1;
+		}
 	}
 
 	rc1 = vxlan_xmit_one(skb, dev, rdst0, did_rsc);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
  2013-06-05  4:24 ` [PATCH net-next 02/10] vxlan: handle skb_clone failure Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05  6:47   ` Cong Wang
  2013-06-05 15:37   ` [PATCH net] vxlan: fix crash on module removal Stephen Hemminger
  2013-06-05  4:24 ` [PATCH net-next 04/10] vxlan: send notification when MAC migrates Stephen Hemminger
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

Do join/leave from work queue to avoid lock inversion problems
between normal socket and RTNL. The code comes out cleaner
as well.

Uses Cong Wang's suggestion to turn refcnt into a real atomic
since now need to handle case where last use of socket is IGMP
worker.

Also fixes race where vxlan_stop could be called after
device was deleted on module removal. The call to rtnl_link_unregister
would call dellink while vxlan device was still up. Reordering
the calls fixes it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

---
The last change probably needs a separate patch for -stable
---
 drivers/net/vxlan.c |  105 ++++++++++++++++++++-------------------------------
 1 file changed, 41 insertions(+), 64 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9085c81..f98f459 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -85,7 +85,7 @@ struct vxlan_sock {
 	struct hlist_node hlist;
 	struct rcu_head	  rcu;
 	struct work_struct del_work;
-	unsigned int	  refcnt;
+	atomic_t	  refcnt;
 	struct socket	  *sock;
 	struct hlist_head vni_list[VNI_HASH_SIZE];
 };
@@ -131,6 +131,7 @@ struct vxlan_dev {
 	__u8		  ttl;
 	u32		  flags;	/* VXLAN_F_* below */
 
+	struct work_struct igmp_work;
 	unsigned long	  age_interval;
 	struct timer_list age_timer;
 	spinlock_t	  hash_lock;
@@ -644,76 +645,58 @@ static int vxlan_snoop(struct net_device *dev,
 
 
 /* See if multicast group is already in use by other ID */
-static bool vxlan_group_used(struct vxlan_net *vn,
-			     const struct vxlan_dev *this)
+static bool vxlan_group_used(struct vxlan_net *vn, __be32 remote_ip)
 {
 	struct vxlan_dev *vxlan;
 
 	list_for_each_entry(vxlan, &vn->vxlan_list, next) {
-		if (vxlan == this)
-			continue;
-
 		if (!netif_running(vxlan->dev))
 			continue;
 
-		if (vxlan->default_dst.remote_ip == this->default_dst.remote_ip)
+		if (vxlan->default_dst.remote_ip == remote_ip)
 			return true;
 	}
 
 	return false;
 }
 
-/* kernel equivalent to IP_ADD_MEMBERSHIP */
-static int vxlan_join_group(struct net_device *dev)
+static void vxlan_sock_hold(struct vxlan_sock *vs)
 {
-	struct vxlan_dev *vxlan = netdev_priv(dev);
-	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
-	struct sock *sk = vxlan->vn_sock->sock->sk;
-	struct ip_mreqn mreq = {
-		.imr_multiaddr.s_addr	= vxlan->default_dst.remote_ip,
-		.imr_ifindex		= vxlan->default_dst.remote_ifindex,
-	};
-	int err;
-
-	/* Already a member of group */
-	if (vxlan_group_used(vn, vxlan))
-		return 0;
+	atomic_inc(&vs->refcnt);
+}
 
-	/* Need to drop RTNL to call multicast join */
-	rtnl_unlock();
-	lock_sock(sk);
-	err = ip_mc_join_group(sk, &mreq);
-	release_sock(sk);
-	rtnl_lock();
+static void vxlan_sock_release(struct vxlan_sock *vs)
+{
+	if (!atomic_dec_and_test(&vs->refcnt))
+		return;
 
-	return err;
+	hlist_del_rcu(&vs->hlist);
+	schedule_work(&vs->del_work);
 }
 
-
-/* kernel equivalent to IP_DROP_MEMBERSHIP */
-static int vxlan_leave_group(struct net_device *dev)
+/* Callback to update multicast group membership.
+ * Scheduled when vxlan goes up/down.
+ */
+static void vxlan_igmp_work(struct work_struct *work)
 {
-	struct vxlan_dev *vxlan = netdev_priv(dev);
-	struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
-	int err = 0;
-	struct sock *sk = vxlan->vn_sock->sock->sk;
+	struct vxlan_dev *vxlan = container_of(work, struct vxlan_dev, igmp_work);
+	struct vxlan_net *vn = net_generic(dev_net(vxlan->dev), vxlan_net_id);
+	struct vxlan_sock *vs = vxlan->vn_sock;
+	struct sock *sk = vs->sock->sk;
 	struct ip_mreqn mreq = {
 		.imr_multiaddr.s_addr	= vxlan->default_dst.remote_ip,
 		.imr_ifindex		= vxlan->default_dst.remote_ifindex,
 	};
 
-	/* Only leave group when last vxlan is done. */
-	if (vxlan_group_used(vn, vxlan))
-		return 0;
-
-	/* Need to drop RTNL to call multicast leave */
-	rtnl_unlock();
 	lock_sock(sk);
-	err = ip_mc_leave_group(sk, &mreq);
+	if (vxlan_group_used(vn, vxlan->default_dst.remote_ip))
+		ip_mc_join_group(sk, &mreq);
+	else
+		ip_mc_leave_group(sk, &mreq);
 	release_sock(sk);
-	rtnl_lock();
 
-	return err;
+	vxlan_sock_release(vs);
+	dev_put(vxlan->dev);
 }
 
 /* Callback from net/ipv4/udp.c to receive packets */
@@ -1261,12 +1244,11 @@ static int vxlan_init(struct net_device *dev)
 static int vxlan_open(struct net_device *dev)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
-	int err;
 
 	if (IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip))) {
-		err = vxlan_join_group(dev);
-		if (err)
-			return err;
+		vxlan_sock_hold(vxlan->vn_sock);
+		dev_hold(dev);
+		schedule_work(&vxlan->igmp_work);
 	}
 
 	if (vxlan->age_interval)
@@ -1297,8 +1279,11 @@ static int vxlan_stop(struct net_device *dev)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 
-	if (IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip)))
-		vxlan_leave_group(dev);
+	if (IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip))) {
+		vxlan_sock_hold(vxlan->vn_sock);
+		dev_hold(dev);
+		schedule_work(&vxlan->igmp_work);
+	}
 
 	del_timer_sync(&vxlan->age_timer);
 
@@ -1367,6 +1352,7 @@ static void vxlan_setup(struct net_device *dev)
 
 	INIT_LIST_HEAD(&vxlan->next);
 	spin_lock_init(&vxlan->hash_lock);
+	INIT_WORK(&vxlan->igmp_work, vxlan_igmp_work);
 
 	init_timer_deferrable(&vxlan->age_timer);
 	vxlan->age_timer.function = vxlan_cleanup;
@@ -1510,8 +1496,8 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port)
 	udp_sk(sk)->encap_type = 1;
 	udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv;
 	udp_encap_enable();
+	atomic_set(&vs->refcnt, 1);
 
-	vs->refcnt = 1;
 	return vs;
 }
 
@@ -1601,7 +1587,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 
 	vs = vxlan_find_port(net, vxlan->dst_port);
 	if (vs)
-		++vs->refcnt;
+		atomic_inc(&vs->refcnt);
 	else {
 		/* Drop lock because socket create acquires RTNL lock */
 		rtnl_unlock();
@@ -1618,12 +1604,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 
 	err = register_netdevice(dev);
 	if (err) {
-		if (--vs->refcnt == 0) {
-			rtnl_unlock();
-			sk_release_kernel(vs->sock->sk);
-			kfree(vs);
-			rtnl_lock();
-		}
+		vxlan_sock_release(vs);
 		return err;
 	}
 
@@ -1641,11 +1622,7 @@ static void vxlan_dellink(struct net_device *dev, struct list_head *head)
 	hlist_del_rcu(&vxlan->hlist);
 	list_del(&vxlan->next);
 	unregister_netdevice_queue(dev, head);
-
-	if (--vs->refcnt == 0) {
-		hlist_del_rcu(&vs->hlist);
-		schedule_work(&vs->del_work);
-	}
+	vxlan_sock_release(vs);
 }
 
 static size_t vxlan_get_size(const struct net_device *dev)
@@ -1784,8 +1761,8 @@ late_initcall(vxlan_init_module);
 
 static void __exit vxlan_cleanup_module(void)
 {
-	rtnl_link_unregister(&vxlan_link_ops);
 	unregister_pernet_device(&vxlan_net_ops);
+	rtnl_link_unregister(&vxlan_link_ops);
 	rcu_barrier();
 }
 module_exit(vxlan_cleanup_module);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 04/10] vxlan: send notification when MAC migrates
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
  2013-06-05  4:24 ` [PATCH net-next 02/10] vxlan: handle skb_clone failure Stephen Hemminger
  2013-06-05  4:24 ` [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05  4:24 ` [PATCH net-next 05/10] vxlan: make vxlan_xmit_one void Stephen Hemminger
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

When learned entry migrates to another IP send a notification
that entry has changed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/vxlan.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f98f459..25d3e9e 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -628,6 +628,7 @@ static int vxlan_snoop(struct net_device *dev,
 
 		f->remote.remote_ip = src_ip;
 		f->updated = jiffies;
+		vxlan_fdb_notify(vxlan, f, RTM_NEWNEIGH);
 	} else {
 		/* learned new entry */
 		spin_lock(&vxlan->hash_lock);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 05/10] vxlan: make vxlan_xmit_one void
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
                   ` (2 preceding siblings ...)
  2013-06-05  4:24 ` [PATCH net-next 04/10] vxlan: send notification when MAC migrates Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05 12:54   ` David Stevens
  2013-06-05  4:24 ` [PATCH net-next 06/10] vxlan: convert remotes list to list_rcu Stephen Hemminger
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

The function vxlan_xmit_one always returns NETDEV_TX_OK, so there
is no point in keeping track of return values etc.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/vxlan.c |   26 +++++++++-----------------
 1 file changed, 9 insertions(+), 17 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 25d3e9e..f0b7605 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -996,8 +996,8 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
 	}
 }
 
-static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
-				  struct vxlan_rdst *rdst, bool did_rsc)
+static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
+			   struct vxlan_rdst *rdst, bool did_rsc)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct rtable *rt;
@@ -1020,7 +1020,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		if (did_rsc) {
 			/* short-circuited back to local bridge */
 			vxlan_encap_bypass(skb, vxlan, vxlan);
-			return NETDEV_TX_OK;
+			return;
 		}
 		goto drop;
 	}
@@ -1076,7 +1076,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		if (!dst_vxlan)
 			goto tx_error;
 		vxlan_encap_bypass(skb, vxlan, dst_vxlan);
-		return NETDEV_TX_OK;
+		return;
 	}
 
 	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
@@ -1120,7 +1120,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		goto drop;
 
 	iptunnel_xmit(skb, dev);
-	return NETDEV_TX_OK;
+	return;
 
 drop:
 	dev->stats.tx_dropped++;
@@ -1130,7 +1130,6 @@ tx_error:
 	dev->stats.tx_errors++;
 tx_free:
 	dev_kfree_skb(skb);
-	return NETDEV_TX_OK;
 }
 
 /* Transmit local packets over Vxlan
@@ -1146,7 +1145,6 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	bool did_rsc = false;
 	struct vxlan_rdst *rdst0, *rdst;
 	struct vxlan_fdb *f;
-	int rc1, rc;
 
 	skb_reset_mac_header(skb);
 	eth = eth_hdr(skb);
@@ -1174,24 +1172,18 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	} else
 		rdst0 = &f->remote;
 
-	rc = NETDEV_TX_OK;
 
 	/* if there are multiple destinations, send copies */
 	for (rdst = rdst0->remote_next; rdst; rdst = rdst->remote_next) {
 		struct sk_buff *skb1;
 
 		skb1 = skb_clone(skb, GFP_ATOMIC);
-		if (skb1) {
-			rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
-			if (rc == NETDEV_TX_OK)
-				rc = rc1;
-		}
+		if (skb1)
+			vxlan_xmit_one(skb1, dev, rdst, did_rsc);
 	}
 
-	rc1 = vxlan_xmit_one(skb, dev, rdst0, did_rsc);
-	if (rc == NETDEV_TX_OK)
-		rc = rc1;
-	return rc;
+	vxlan_xmit_one(skb, dev, rdst0, did_rsc);
+	return NETDEV_TX_OK;
 }
 
 /* Walk the forwarding table and purge stale entries */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 06/10] vxlan: convert remotes list to list_rcu
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
                   ` (3 preceding siblings ...)
  2013-06-05  4:24 ` [PATCH net-next 05/10] vxlan: make vxlan_xmit_one void Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05  4:24 ` [PATCH net-next 07/10] vxlan: port module param should be ushort Stephen Hemminger
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

Based on initial work by Mike Rapoport <mike.rapoport@ravellosystems.com>
Use list macros and RCU for tracking multiple remotes.

Note: this code assumes list always has at least one entry,
because delete is not supported.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/vxlan.c |   99 +++++++++++++++++++++++++++++----------------------
 1 file changed, 56 insertions(+), 43 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index f0b7605..593c895 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -101,7 +101,7 @@ struct vxlan_rdst {
 	__be16			 remote_port;
 	u32			 remote_vni;
 	u32			 remote_ifindex;
-	struct vxlan_rdst	*remote_next;
+	struct list_head	 list;
 };
 
 /* Forwarding table entry */
@@ -110,7 +110,7 @@ struct vxlan_fdb {
 	struct rcu_head	  rcu;
 	unsigned long	  updated;	/* jiffies */
 	unsigned long	  used;
-	struct vxlan_rdst remote;
+	struct list_head  remotes;
 	u16		  state;	/* see ndm_state */
 	u8		  flags;	/* see ndm_flags */
 	u8		  eth_addr[ETH_ALEN];
@@ -164,6 +164,14 @@ static inline struct hlist_head *vs_head(struct net *net, __be16 port)
 	return &vn->sock_list[hash_32(ntohs(port), PORT_HASH_BITS)];
 }
 
+/* First remote destination for a forwarding entry.
+ * Guaranteed to be non-NULL because remotes are never deleted.
+ */
+static inline struct vxlan_rdst *first_remote(struct vxlan_fdb *fdb)
+{
+	return list_first_or_null_rcu(&fdb->remotes, struct vxlan_rdst, list);
+}
+
 /* Find VXLAN socket based on network namespace and UDP port */
 static struct vxlan_sock *vxlan_find_port(struct net *net, __be16 port)
 {
@@ -269,7 +277,7 @@ static inline size_t vxlan_nlmsg_size(void)
 }
 
 static void vxlan_fdb_notify(struct vxlan_dev *vxlan,
-			     const struct vxlan_fdb *fdb, int type)
+			     struct vxlan_fdb *fdb, int type)
 {
 	struct net *net = dev_net(vxlan->dev);
 	struct sk_buff *skb;
@@ -279,7 +287,7 @@ static void vxlan_fdb_notify(struct vxlan_dev *vxlan,
 	if (skb == NULL)
 		goto errout;
 
-	err = vxlan_fdb_info(skb, vxlan, fdb, 0, 0, type, 0, &fdb->remote);
+	err = vxlan_fdb_info(skb, vxlan, fdb, 0, 0, type, 0, first_remote(fdb));
 	if (err < 0) {
 		/* -EMSGSIZE implies BUG in vxlan_nlmsg_size() */
 		WARN_ON(err == -EMSGSIZE);
@@ -298,11 +306,16 @@ static void vxlan_ip_miss(struct net_device *dev, __be32 ipa)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 	struct vxlan_fdb f;
+	struct vxlan_rdst remote;
 
 	memset(&f, 0, sizeof f);
 	f.state = NUD_STALE;
-	f.remote.remote_ip = ipa; /* goes to NDA_DST */
-	f.remote.remote_vni = VXLAN_N_VID;
+
+	remote.remote_ip = ipa; /* goes to NDA_DST */
+	remote.remote_vni = VXLAN_N_VID;
+
+	INIT_LIST_HEAD(&f.remotes);
+	list_add_rcu(&remote.list, &f.remotes);
 
 	vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH);
 }
@@ -312,6 +325,7 @@ static void vxlan_fdb_miss(struct vxlan_dev *vxlan, const u8 eth_addr[ETH_ALEN])
 	struct vxlan_fdb	f;
 
 	memset(&f, 0, sizeof f);
+	INIT_LIST_HEAD(&f.remotes);
 	f.state = NUD_STALE;
 	memcpy(f.eth_addr, eth_addr, ETH_ALEN);
 
@@ -371,17 +385,17 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
 static int vxlan_fdb_append(struct vxlan_fdb *f,
 			    __be32 ip, __be16 port, __u32 vni, __u32 ifindex)
 {
-	struct vxlan_rdst *rd_prev, *rd;
+	struct vxlan_rdst *rd;
 
-	rd_prev = NULL;
-	for (rd = &f->remote; rd; rd = rd->remote_next) {
+	/* protected by vxlan->hash_lock */
+	list_for_each_entry(rd, &f->remotes, list) {
 		if (rd->remote_ip == ip &&
 		    rd->remote_port == port &&
 		    rd->remote_vni == vni &&
 		    rd->remote_ifindex == ifindex)
 			return 0;
-		rd_prev = rd;
 	}
+
 	rd = kmalloc(sizeof(*rd), GFP_ATOMIC);
 	if (rd == NULL)
 		return -ENOBUFS;
@@ -389,8 +403,9 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 	rd->remote_port = port;
 	rd->remote_vni = vni;
 	rd->remote_ifindex = ifindex;
-	rd->remote_next = NULL;
-	rd_prev->remote_next = rd;
+
+	list_add_tail_rcu(&rd->list, &f->remotes);
+
 	return 1;
 }
 
@@ -442,16 +457,14 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 			return -ENOMEM;
 
 		notify = 1;
-		f->remote.remote_ip = ip;
-		f->remote.remote_port = port;
-		f->remote.remote_vni = vni;
-		f->remote.remote_ifindex = ifindex;
-		f->remote.remote_next = NULL;
 		f->state = state;
 		f->flags = ndm_flags;
 		f->updated = f->used = jiffies;
+		INIT_LIST_HEAD(&f->remotes);
 		memcpy(f->eth_addr, mac, ETH_ALEN);
 
+		vxlan_fdb_append(f, ip, port, vni, ifindex);
+
 		++vxlan->addrcnt;
 		hlist_add_head_rcu(&f->hlist,
 				   vxlan_fdb_head(vxlan, mac));
@@ -466,13 +479,10 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 static void vxlan_fdb_free(struct rcu_head *head)
 {
 	struct vxlan_fdb *f = container_of(head, struct vxlan_fdb, rcu);
+	struct vxlan_rdst *rd, *nd;
 
-	while (f->remote.remote_next) {
-		struct vxlan_rdst *rd = f->remote.remote_next;
-
-		f->remote.remote_next = rd->remote_next;
+	list_for_each_entry_safe(rd, nd, &f->remotes, list)
 		kfree(rd);
-	}
 	kfree(f);
 }
 
@@ -582,23 +592,24 @@ static int vxlan_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb,
 
 		hlist_for_each_entry_rcu(f, &vxlan->fdb_head[h], hlist) {
 			struct vxlan_rdst *rd;
-			for (rd = &f->remote; rd; rd = rd->remote_next) {
-				if (idx < cb->args[0])
-					goto skip;
 
+			if (idx < cb->args[0])
+				goto skip;
+
+			list_for_each_entry_rcu(rd, &f->remotes, list) {
 				err = vxlan_fdb_info(skb, vxlan, f,
 						     NETLINK_CB(cb->skb).portid,
 						     cb->nlh->nlmsg_seq,
 						     RTM_NEWNEIGH,
 						     NLM_F_MULTI, rd);
 				if (err < 0)
-					break;
-skip:
-				++idx;
+					goto out;
 			}
+skip:
+			++idx;
 		}
 	}
-
+out:
 	return idx;
 }
 
@@ -614,19 +625,21 @@ static int vxlan_snoop(struct net_device *dev,
 
 	f = vxlan_find_mac(vxlan, src_mac);
 	if (likely(f)) {
-		if (likely(f->remote.remote_ip == src_ip))
-			return 0;
+		struct vxlan_rdst *remote = first_remote(f);
 
 		/* Don't migrate static entries, drop packets */
 		if (!(f->flags & NTF_SELF))
 			return 1;
 
+		if (likely(remote->remote_ip == src_ip))
+			return 0;
+
 		if (net_ratelimit())
 			netdev_info(dev,
 				    "%pM migrated from %pI4 to %pI4\n",
-				    src_mac, &f->remote.remote_ip, &src_ip);
+				    src_mac, &remote->remote_ip, &src_ip);
 
-		f->remote.remote_ip = src_ip;
+		remote->remote_ip = src_ip;
 		f->updated = jiffies;
 		vxlan_fdb_notify(vxlan, f, RTM_NEWNEIGH);
 	} else {
@@ -854,7 +867,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
 		}
 
 		f = vxlan_find_mac(vxlan, n->ha);
-		if (f && f->remote.remote_ip == htonl(INADDR_ANY)) {
+		if (f && first_remote(f)->remote_ip == htonl(INADDR_ANY)) {
 			/* bridge-local neighbor */
 			neigh_release(n);
 			goto out;
@@ -1169,17 +1182,17 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 		    (vxlan->flags & VXLAN_F_L2MISS) &&
 		    !is_multicast_ether_addr(eth->h_dest))
 			vxlan_fdb_miss(vxlan, eth->h_dest);
-	} else
-		rdst0 = &f->remote;
-
+	} else {
+		rdst = rdst0 = first_remote(f);
 
-	/* if there are multiple destinations, send copies */
-	for (rdst = rdst0->remote_next; rdst; rdst = rdst->remote_next) {
-		struct sk_buff *skb1;
+		/* if there are multiple destinations, send copies */
+		list_for_each_entry_continue_rcu(rdst, &f->remotes, list) {
+			struct sk_buff *skb1;
 
-		skb1 = skb_clone(skb, GFP_ATOMIC);
-		if (skb1)
-			vxlan_xmit_one(skb1, dev, rdst, did_rsc);
+			skb1 = skb_clone(skb, GFP_ATOMIC);
+			if (skb1)
+				vxlan_xmit_one(skb1, dev, rdst, did_rsc);
+		}
 	}
 
 	vxlan_xmit_one(skb, dev, rdst0, did_rsc);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 07/10] vxlan: port module param should be ushort
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
                   ` (4 preceding siblings ...)
  2013-06-05  4:24 ` [PATCH net-next 06/10] vxlan: convert remotes list to list_rcu Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05 13:35   ` Sergei Shtylyov
  2013-06-05  4:24 ` [PATCH net-next 08/10] vxlan: use initializer for dummy structures Stephen Hemminger
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

UDP ports's are limited to 16 bits.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/vxlan.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 593c895..57b5274 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -70,8 +70,8 @@ struct vxlanhdr {
  * The IANA assigned port is 4789, but the Linux default is 8472
  * for compatability with early adopters.
  */
-static unsigned int vxlan_port __read_mostly = 8472;
-module_param_named(udp_port, vxlan_port, uint, 0444);
+static unsigned short vxlan_port __read_mostly = 8472;
+module_param_named(udp_port, vxlan_port, ushort, 0444);
 MODULE_PARM_DESC(udp_port, "Destination UDP port");
 
 static bool log_ecn_error = true;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 08/10] vxlan:  use initializer for dummy structures
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
                   ` (5 preceding siblings ...)
  2013-06-05  4:24 ` [PATCH net-next 07/10] vxlan: port module param should be ushort Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05  4:24 ` [PATCH net-next 09/10] vxlan: whitespace cleanup Stephen Hemminger
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

For the notification code, a couple of places build fdb entries on
the stack, use structure initialization instead and fix formatting.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/vxlan.c |   21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 57b5274..664381a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -305,14 +305,13 @@ errout:
 static void vxlan_ip_miss(struct net_device *dev, __be32 ipa)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
-	struct vxlan_fdb f;
-	struct vxlan_rdst remote;
-
-	memset(&f, 0, sizeof f);
-	f.state = NUD_STALE;
-
-	remote.remote_ip = ipa; /* goes to NDA_DST */
-	remote.remote_vni = VXLAN_N_VID;
+	struct vxlan_fdb f = {
+		.state = NUD_STALE,
+	};
+	struct vxlan_rdst remote = {
+		.remote_ip = ipa, /* goes to NDA_DST */
+		.remote_vni = VXLAN_N_VID,
+	};
 
 	INIT_LIST_HEAD(&f.remotes);
 	list_add_rcu(&remote.list, &f.remotes);
@@ -322,11 +321,11 @@ static void vxlan_ip_miss(struct net_device *dev, __be32 ipa)
 
 static void vxlan_fdb_miss(struct vxlan_dev *vxlan, const u8 eth_addr[ETH_ALEN])
 {
-	struct vxlan_fdb	f;
+	struct vxlan_fdb f = {
+		.state = NUD_STALE,
+	};
 
-	memset(&f, 0, sizeof f);
 	INIT_LIST_HEAD(&f.remotes);
-	f.state = NUD_STALE;
 	memcpy(f.eth_addr, eth_addr, ETH_ALEN);
 
 	vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 09/10] vxlan: whitespace cleanup
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
                   ` (6 preceding siblings ...)
  2013-06-05  4:24 ` [PATCH net-next 08/10] vxlan: use initializer for dummy structures Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05 12:55   ` David Stevens
  2013-06-05  4:24 ` [PATCH net-next 10/10] vxlan: version 0.2 Stephen Hemminger
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/vxlan.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 664381a..a3690d4 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -204,9 +204,9 @@ static struct vxlan_dev *vxlan_find_vni(struct net *net, u32 id, __be16 port)
 
 /* Fill in neighbour message in skbuff. */
 static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
-			   const struct vxlan_fdb *fdb,
-			   u32 portid, u32 seq, int type, unsigned int flags,
-			   const struct vxlan_rdst *rdst)
+			  const struct vxlan_fdb *fdb,
+			  u32 portid, u32 seq, int type, unsigned int flags,
+			  const struct vxlan_rdst *rdst)
 {
 	unsigned long now = jiffies;
 	struct nda_cacheinfo ci;
@@ -1020,7 +1020,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	struct flowi4 fl4;
 	__be32 dst;
 	__be16 src_port, dst_port;
-        u32 vni;
+	u32 vni;
 	__be16 df = 0;
 	__u8 tos, ttl;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH net-next 10/10] vxlan: version 0.2
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
                   ` (7 preceding siblings ...)
  2013-06-05  4:24 ` [PATCH net-next 09/10] vxlan: whitespace cleanup Stephen Hemminger
@ 2013-06-05  4:24 ` Stephen Hemminger
  2013-06-05  6:23 ` [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Cong Wang
  2013-06-06 23:16 ` David Miller
  10 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05  4:24 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/vxlan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index a3690d4..61f04b6 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -42,7 +42,7 @@
 #include <net/net_namespace.h>
 #include <net/netns/generic.h>
 
-#define VXLAN_VERSION	"0.1"
+#define VXLAN_VERSION	"0.2"
 
 #define PORT_HASH_BITS	8
 #define PORT_HASH_SIZE  (1<<PORT_HASH_BITS)
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
                   ` (8 preceding siblings ...)
  2013-06-05  4:24 ` [PATCH net-next 10/10] vxlan: version 0.2 Stephen Hemminger
@ 2013-06-05  6:23 ` Cong Wang
  2013-06-06 23:16 ` David Miller
  10 siblings, 0 replies; 30+ messages in thread
From: Cong Wang @ 2013-06-05  6:23 UTC (permalink / raw)
  To: netdev

On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
> Only migrate dynamic forwarding table entries, don't modify
> static entries. If packet received from incorrect source IP address
> assume it is an imposter and drop it.
>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>

Nitpick: return bool instead of int

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue
  2013-06-05  4:24 ` [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue Stephen Hemminger
@ 2013-06-05  6:47   ` Cong Wang
  2013-06-05  7:29     ` Mike Rapoport
  2013-06-05 15:42     ` Stephen Hemminger
  2013-06-05 15:37   ` [PATCH net] vxlan: fix crash on module removal Stephen Hemminger
  1 sibling, 2 replies; 30+ messages in thread
From: Cong Wang @ 2013-06-05  6:47 UTC (permalink / raw)
  To: netdev

On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
> Do join/leave from work queue to avoid lock inversion problems
> between normal socket and RTNL. The code comes out cleaner
> as well.
>
> Uses Cong Wang's suggestion to turn refcnt into a real atomic
> since now need to handle case where last use of socket is IGMP
> worker.
>
> Also fixes race where vxlan_stop could be called after
> device was deleted on module removal. The call to rtnl_link_unregister
> would call dellink while vxlan device was still up. Reordering
> the calls fixes it.
>

After the first 3 patches applied, I got:

[   55.010954] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[   55.013309] CPU: 1 PID: 163 Comm: kworker/1:2 Not tainted
3.10.0-rc2+ #1150
[   55.013309] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   55.013309] Workqueue: events vxlan_igmp_work
[   55.013309] task: ffff880070eac900 ti: ffff8800716d4000 task.ti:
ffff8800716d4000
[   55.013309] RIP: 0010:[<ffffffff815df704>]  [<ffffffff815df704>]
vxlan_sock_release+0x25/0x55
[   55.013309] RSP: 0018:ffff8800716d5cf8  EFLAGS: 00010246
[   55.013309] RAX: 0000000000000000 RBX: ffff88006fd7c000 RCX:
0000000ccee84d06
[   55.013309] RDX: dead000000200200 RSI: ffff880070ead048 RDI:
ffff88006fd7c070
[   55.013309] RBP: ffff8800716d5d08 R08: 0000000000000000 R09:
ffff8800716d5c48
[   55.013309] R10: 000000000000b6c4 R11: 000000000000b163 R12:
ffff88006ebb7400
[   55.013309] R13: ffff88006fd7c000 R14: ffff8800723b4520 R15:
0000000000000000
[   55.013309] FS:  0000000000000000(0000) GS:ffff88007f800000(0000)
knlGS:0000000000000000
[   55.013309] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   55.013309] CR2: 00007fa99dc0a000 CR3: 0000000070c17000 CR4:
00000000000006e0
[   55.013309] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   55.013309] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[   55.013309] Stack:
[   55.013309]  ffff880070eac900 ffff880070fc0b58 ffff8800716d5d48
ffffffff815e0a73
[   55.013309]  080808e0716d5d28 0000000f00000000 ffff8800714ef008
ffff880070fc0b58
[   55.013309]  ffff88007f9d6200 ffff88007f9d38c0 ffff8800716d5de8
ffffffff8106872f
[   55.013309] Call Trace:
[   55.013309]  [<ffffffff815e0a73>] vxlan_igmp_work+0xa8/0xcf
[   55.013309]  [<ffffffff8106872f>] process_one_work+0x240/0x408
[   55.013309]  [<ffffffff81068662>] ? process_one_work+0x173/0x408
[   55.013309]  [<ffffffff81068c82>] worker_thread+0x15d/0x1f1
[   55.013309]  [<ffffffff81068b25>] ? rescuer_thread+0x1ff/0x1ff
[   55.013309]  [<ffffffff81070c19>] kthread+0xb1/0xb9
[   55.013309]  [<ffffffff81070b68>] ? freezing+0x17/0x17
[   55.013309]  [<ffffffff81976bdc>] ret_from_fork+0x7c/0xb0
[   55.013309]  [<ffffffff81070b68>] ? freezing+0x17/0x17

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 02/10] vxlan: handle skb_clone failure
  2013-06-05  4:24 ` [PATCH net-next 02/10] vxlan: handle skb_clone failure Stephen Hemminger
@ 2013-06-05  6:59   ` Cong Wang
  2013-06-05 14:05     ` David Stevens
  2013-06-05 12:50   ` David Stevens
  1 sibling, 1 reply; 30+ messages in thread
From: Cong Wang @ 2013-06-05  6:59 UTC (permalink / raw)
  To: netdev

On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
>  		skb1 = skb_clone(skb, GFP_ATOMIC);
> -		rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
> -		if (rc == NETDEV_TX_OK)
> -			rc = rc1;
> +		if (skb1) {
> +			rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
> +			if (rc == NETDEV_TX_OK)
> +				rc = rc1;
> +		}

If OOM, shouldn't we exit immediately instead of continue handle the
next one?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue
  2013-06-05  6:47   ` Cong Wang
@ 2013-06-05  7:29     ` Mike Rapoport
  2013-06-05  8:00       ` Cong Wang
  2013-06-05 15:41       ` Stephen Hemminger
  2013-06-05 15:42     ` Stephen Hemminger
  1 sibling, 2 replies; 30+ messages in thread
From: Mike Rapoport @ 2013-06-05  7:29 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Wed, Jun 5, 2013 at 9:47 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
>> Do join/leave from work queue to avoid lock inversion problems
>> between normal socket and RTNL. The code comes out cleaner
>> as well.
>>
>> Uses Cong Wang's suggestion to turn refcnt into a real atomic
>> since now need to handle case where last use of socket is IGMP
>> worker.
>>
>> Also fixes race where vxlan_stop could be called after
>> device was deleted on module removal. The call to rtnl_link_unregister
>> would call dellink while vxlan device was still up. Reordering
>> the calls fixes it.
>>
>
> After the first 3 patches applied, I got:
>
> [   55.010954] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> [   55.013309] CPU: 1 PID: 163 Comm: kworker/1:2 Not tainted
> 3.10.0-rc2+ #1150
> [   55.013309] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [   55.013309] Workqueue: events vxlan_igmp_work

I think the problem happens because vxlan_dellink does
unregister_netdevice_queue and then immediately calls
vxlan_sock_release and thus vs_sock is released before igmp_work
starts

> [   55.013309] task: ffff880070eac900 ti: ffff8800716d4000 task.ti:
> ffff8800716d4000
> [   55.013309] RIP: 0010:[<ffffffff815df704>]  [<ffffffff815df704>]
> vxlan_sock_release+0x25/0x55
> [   55.013309] RSP: 0018:ffff8800716d5cf8  EFLAGS: 00010246
> [   55.013309] RAX: 0000000000000000 RBX: ffff88006fd7c000 RCX:
> 0000000ccee84d06
> [   55.013309] RDX: dead000000200200 RSI: ffff880070ead048 RDI:
> ffff88006fd7c070
> [   55.013309] RBP: ffff8800716d5d08 R08: 0000000000000000 R09:
> ffff8800716d5c48
> [   55.013309] R10: 000000000000b6c4 R11: 000000000000b163 R12:
> ffff88006ebb7400
> [   55.013309] R13: ffff88006fd7c000 R14: ffff8800723b4520 R15:
> 0000000000000000
> [   55.013309] FS:  0000000000000000(0000) GS:ffff88007f800000(0000)
> knlGS:0000000000000000
> [   55.013309] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   55.013309] CR2: 00007fa99dc0a000 CR3: 0000000070c17000 CR4:
> 00000000000006e0
> [   55.013309] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [   55.013309] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [   55.013309] Stack:
> [   55.013309]  ffff880070eac900 ffff880070fc0b58 ffff8800716d5d48
> ffffffff815e0a73
> [   55.013309]  080808e0716d5d28 0000000f00000000 ffff8800714ef008
> ffff880070fc0b58
> [   55.013309]  ffff88007f9d6200 ffff88007f9d38c0 ffff8800716d5de8
> ffffffff8106872f
> [   55.013309] Call Trace:
> [   55.013309]  [<ffffffff815e0a73>] vxlan_igmp_work+0xa8/0xcf
> [   55.013309]  [<ffffffff8106872f>] process_one_work+0x240/0x408
> [   55.013309]  [<ffffffff81068662>] ? process_one_work+0x173/0x408
> [   55.013309]  [<ffffffff81068c82>] worker_thread+0x15d/0x1f1
> [   55.013309]  [<ffffffff81068b25>] ? rescuer_thread+0x1ff/0x1ff
> [   55.013309]  [<ffffffff81070c19>] kthread+0xb1/0xb9
> [   55.013309]  [<ffffffff81070b68>] ? freezing+0x17/0x17
> [   55.013309]  [<ffffffff81976bdc>] ret_from_fork+0x7c/0xb0
> [   55.013309]  [<ffffffff81070b68>] ? freezing+0x17/0x17
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue
  2013-06-05  7:29     ` Mike Rapoport
@ 2013-06-05  8:00       ` Cong Wang
  2013-06-05 15:41       ` Stephen Hemminger
  1 sibling, 0 replies; 30+ messages in thread
From: Cong Wang @ 2013-06-05  8:00 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Linux Kernel Network Developers, Stephen Hemminger

On Wed, Jun 5, 2013 at 3:29 PM, Mike Rapoport
<mike.rapoport@ravellosystems.com> wrote:
> On Wed, Jun 5, 2013 at 9:47 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
>>> Do join/leave from work queue to avoid lock inversion problems
>>> between normal socket and RTNL. The code comes out cleaner
>>> as well.
>>>
>>> Uses Cong Wang's suggestion to turn refcnt into a real atomic
>>> since now need to handle case where last use of socket is IGMP
>>> worker.
>>>
>>> Also fixes race where vxlan_stop could be called after
>>> device was deleted on module removal. The call to rtnl_link_unregister
>>> would call dellink while vxlan device was still up. Reordering
>>> the calls fixes it.
>>>
>>
>> After the first 3 patches applied, I got:
>>
>> [   55.010954] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
>> [   55.013309] CPU: 1 PID: 163 Comm: kworker/1:2 Not tainted
>> 3.10.0-rc2+ #1150
>> [   55.013309] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
>> [   55.013309] Workqueue: events vxlan_igmp_work
>
> I think the problem happens because vxlan_dellink does
> unregister_netdevice_queue and then immediately calls
> vxlan_sock_release and thus vs_sock is released before igmp_work
> starts


Yeah, seems something wrong with refcnt.

Thanks!

>
>> [   55.013309] task: ffff880070eac900 ti: ffff8800716d4000 task.ti:
>> ffff8800716d4000
>> [   55.013309] RIP: 0010:[<ffffffff815df704>]  [<ffffffff815df704>]
>> vxlan_sock_release+0x25/0x55
>> [   55.013309] RSP: 0018:ffff8800716d5cf8  EFLAGS: 00010246
>> [   55.013309] RAX: 0000000000000000 RBX: ffff88006fd7c000 RCX:
>> 0000000ccee84d06
>> [   55.013309] RDX: dead000000200200 RSI: ffff880070ead048 RDI:
>> ffff88006fd7c070
>> [   55.013309] RBP: ffff8800716d5d08 R08: 0000000000000000 R09:
>> ffff8800716d5c48
>> [   55.013309] R10: 000000000000b6c4 R11: 000000000000b163 R12:
>> ffff88006ebb7400
>> [   55.013309] R13: ffff88006fd7c000 R14: ffff8800723b4520 R15:
>> 0000000000000000
>> [   55.013309] FS:  0000000000000000(0000) GS:ffff88007f800000(0000)
>> knlGS:0000000000000000
>> [   55.013309] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> [   55.013309] CR2: 00007fa99dc0a000 CR3: 0000000070c17000 CR4:
>> 00000000000006e0
>> [   55.013309] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> 0000000000000000
>> [   55.013309] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
>> 0000000000000400
>> [   55.013309] Stack:
>> [   55.013309]  ffff880070eac900 ffff880070fc0b58 ffff8800716d5d48
>> ffffffff815e0a73
>> [   55.013309]  080808e0716d5d28 0000000f00000000 ffff8800714ef008
>> ffff880070fc0b58
>> [   55.013309]  ffff88007f9d6200 ffff88007f9d38c0 ffff8800716d5de8
>> ffffffff8106872f
>> [   55.013309] Call Trace:
>> [   55.013309]  [<ffffffff815e0a73>] vxlan_igmp_work+0xa8/0xcf
>> [   55.013309]  [<ffffffff8106872f>] process_one_work+0x240/0x408
>> [   55.013309]  [<ffffffff81068662>] ? process_one_work+0x173/0x408
>> [   55.013309]  [<ffffffff81068c82>] worker_thread+0x15d/0x1f1
>> [   55.013309]  [<ffffffff81068b25>] ? rescuer_thread+0x1ff/0x1ff
>> [   55.013309]  [<ffffffff81070c19>] kthread+0xb1/0xb9
>> [   55.013309]  [<ffffffff81070b68>] ? freezing+0x17/0x17
>> [   55.013309]  [<ffffffff81976bdc>] ret_from_fork+0x7c/0xb0
>> [   55.013309]  [<ffffffff81070b68>] ? freezing+0x17/0x17
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Sincerely yours,
> Mike.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 02/10] vxlan: handle skb_clone failure
  2013-06-05  4:24 ` [PATCH net-next 02/10] vxlan: handle skb_clone failure Stephen Hemminger
  2013-06-05  6:59   ` Cong Wang
@ 2013-06-05 12:50   ` David Stevens
  1 sibling, 0 replies; 30+ messages in thread
From: David Stevens @ 2013-06-05 12:50 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev, netdev-owner, Stephen Hemminger

Acked-by: David L Stevens <dlstevens@us.ibm.com>

> From: Stephen Hemminger <stephen@networkplumber.org>
 
> skb_clone can fail if out of memory. Just skip the fanout.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> ---
> Does not impact stable, this was introduced by the multiple
> destination code.
> ---
>  drivers/net/vxlan.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 536082a..9085c81 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1197,9 +1197,11 @@ static netdev_tx_t vxlan_xmit(struct sk_buff 
> *skb, struct net_device *dev)
>        struct sk_buff *skb1;
> 
>        skb1 = skb_clone(skb, GFP_ATOMIC);
> -      rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
> -      if (rc == NETDEV_TX_OK)
> -         rc = rc1;
> +      if (skb1) {
> +         rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
> +         if (rc == NETDEV_TX_OK)
> +            rc = rc1;
> +      }
>     }
> 
>     rc1 = vxlan_xmit_one(skb, dev, rdst0, did_rsc);
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 05/10] vxlan: make vxlan_xmit_one void
  2013-06-05  4:24 ` [PATCH net-next 05/10] vxlan: make vxlan_xmit_one void Stephen Hemminger
@ 2013-06-05 12:54   ` David Stevens
  0 siblings, 0 replies; 30+ messages in thread
From: David Stevens @ 2013-06-05 12:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev, netdev-owner, Stephen Hemminger

Acked-by: David L Stevens <dlstevens@us.ibm.com>

netdev-owner@vger.kernel.org wrote on 06/05/2013 12:24:09 AM:

> From: Stephen Hemminger <stephen@networkplumber.org>

> 
> The function vxlan_xmit_one always returns NETDEV_TX_OK, so there
> is no point in keeping track of return values etc.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  drivers/net/vxlan.c |   26 +++++++++-----------------
>  1 file changed, 9 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 25d3e9e..f0b7605 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -996,8 +996,8 @@ static void vxlan_encap_bypass(struct sk_buff 
> *skb, struct vxlan_dev *src_vxlan,
>     }
>  }
> 
> -static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct 
> net_device *dev,
> -              struct vxlan_rdst *rdst, bool did_rsc)
> +static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
> +            struct vxlan_rdst *rdst, bool did_rsc)
>  {
>     struct vxlan_dev *vxlan = netdev_priv(dev);
>     struct rtable *rt;
> @@ -1020,7 +1020,7 @@ static netdev_tx_t vxlan_xmit_one(struct 
> sk_buff *skb, struct net_device *dev,
>        if (did_rsc) {
>           /* short-circuited back to local bridge */
>           vxlan_encap_bypass(skb, vxlan, vxlan);
> -         return NETDEV_TX_OK;
> +         return;
>        }
>        goto drop;
>     }
> @@ -1076,7 +1076,7 @@ static netdev_tx_t vxlan_xmit_one(struct 
> sk_buff *skb, struct net_device *dev,
>        if (!dst_vxlan)
>           goto tx_error;
>        vxlan_encap_bypass(skb, vxlan, dst_vxlan);
> -      return NETDEV_TX_OK;
> +      return;
>     }
> 
>     memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
> @@ -1120,7 +1120,7 @@ static netdev_tx_t vxlan_xmit_one(struct 
> sk_buff *skb, struct net_device *dev,
>        goto drop;
> 
>     iptunnel_xmit(skb, dev);
> -   return NETDEV_TX_OK;
> +   return;
> 
>  drop:
>     dev->stats.tx_dropped++;
> @@ -1130,7 +1130,6 @@ tx_error:
>     dev->stats.tx_errors++;
>  tx_free:
>     dev_kfree_skb(skb);
> -   return NETDEV_TX_OK;
>  }
> 
>  /* Transmit local packets over Vxlan
> @@ -1146,7 +1145,6 @@ static netdev_tx_t vxlan_xmit(struct sk_buff 
> *skb, struct net_device *dev)
>     bool did_rsc = false;
>     struct vxlan_rdst *rdst0, *rdst;
>     struct vxlan_fdb *f;
> -   int rc1, rc;
> 
>     skb_reset_mac_header(skb);
>     eth = eth_hdr(skb);
> @@ -1174,24 +1172,18 @@ static netdev_tx_t vxlan_xmit(struct sk_buff
> *skb, struct net_device *dev)
>     } else
>        rdst0 = &f->remote;
> 
> -   rc = NETDEV_TX_OK;
> 
>     /* if there are multiple destinations, send copies */
>     for (rdst = rdst0->remote_next; rdst; rdst = rdst->remote_next) {
>        struct sk_buff *skb1;
> 
>        skb1 = skb_clone(skb, GFP_ATOMIC);
> -      if (skb1) {
> -         rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
> -         if (rc == NETDEV_TX_OK)
> -            rc = rc1;
> -      }
> +      if (skb1)
> +         vxlan_xmit_one(skb1, dev, rdst, did_rsc);
>     }
> 
> -   rc1 = vxlan_xmit_one(skb, dev, rdst0, did_rsc);
> -   if (rc == NETDEV_TX_OK)
> -      rc = rc1;
> -   return rc;
> +   vxlan_xmit_one(skb, dev, rdst0, did_rsc);
> +   return NETDEV_TX_OK;
>  }
> 
>  /* Walk the forwarding table and purge stale entries */
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 09/10] vxlan: whitespace cleanup
  2013-06-05  4:24 ` [PATCH net-next 09/10] vxlan: whitespace cleanup Stephen Hemminger
@ 2013-06-05 12:55   ` David Stevens
  0 siblings, 0 replies; 30+ messages in thread
From: David Stevens @ 2013-06-05 12:55 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev, netdev-owner, Stephen Hemminger

Acked-by: David L Stevens <dlstevens@us.ibm.com>


> From: Stephen Hemminger <stephen@networkplumber.org>
> To: davem@davemloft.net
> Cc: netdev@vger.kernel.org, Stephen Hemminger 
<stephen@networkplumber.org>
> Date: 06/05/2013 12:25 AM
> Subject: [PATCH net-next 09/10] vxlan: whitespace cleanup
> Sent by: netdev-owner@vger.kernel.org
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 07/10] vxlan: port module param should be ushort
  2013-06-05  4:24 ` [PATCH net-next 07/10] vxlan: port module param should be ushort Stephen Hemminger
@ 2013-06-05 13:35   ` Sergei Shtylyov
  0 siblings, 0 replies; 30+ messages in thread
From: Sergei Shtylyov @ 2013-06-05 13:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev

Hello.

On 05-06-2013 8:24, Stephen Hemminger wrote:

> UDP ports's are limited to 16 bits.

    s/ports's/ports/. Maybe a committer can fix.

> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

WBR, Sergei

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 02/10] vxlan: handle skb_clone failure
  2013-06-05  6:59   ` Cong Wang
@ 2013-06-05 14:05     ` David Stevens
  2013-06-06  0:47       ` Cong Wang
  0 siblings, 1 reply; 30+ messages in thread
From: David Stevens @ 2013-06-05 14:05 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, netdev-owner

netdev-owner@vger.kernel.org wrote on 06/05/2013 02:59:56 AM:

> From: Cong Wang <xiyou.wangcong@gmail.com>
> To: netdev@vger.kernel.org
> Date: 06/05/2013 03:00 AM
> Subject: Re: [PATCH net-next 02/10] vxlan: handle skb_clone failure
> Sent by: netdev-owner@vger.kernel.org
> 
> On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger 
> <stephen@networkplumber.org> wrote:
> >        skb1 = skb_clone(skb, GFP_ATOMIC);
> > -      rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
> > -      if (rc == NETDEV_TX_OK)
> > -         rc = rc1;
> > +      if (skb1) {
> > +         rc1 = vxlan_xmit_one(skb1, dev, rdst, did_rsc);
> > +         if (rc == NETDEV_TX_OK)
> > +            rc = rc1;
> > +      }
> 
> If OOM, shouldn't we exit immediately instead of continue handle the
> next one?

It could be a temporary condition; trying again doesn't really hurt,
in case some do go through. So, I prefer this, but could live with
bailing for all destinations, too.

                                                        +-DLS

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH net] vxlan: fix crash on module removal
  2013-06-05  4:24 ` [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue Stephen Hemminger
  2013-06-05  6:47   ` Cong Wang
@ 2013-06-05 15:37   ` Stephen Hemminger
  2013-06-06  1:11     ` Cong Wang
  1 sibling, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05 15:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: davem, netdev

If vxlan is removed with active vxlan's it would crash because
rtnl_link_unregister (which calls vxlan_dellink), was invoked
before unregister_pernet_device (which calls vxlan_stop).

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
See "vxlan: move IGMP join/leave to work queue" for the net-next
version of this

 drivers/net/vxlan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3b1d2ee..5cc3b48 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1677,8 +1677,8 @@ module_init(vxlan_init_module);
 
 static void __exit vxlan_cleanup_module(void)
 {
-	rtnl_link_unregister(&vxlan_link_ops);
 	unregister_pernet_device(&vxlan_net_ops);
+	rtnl_link_unregister(&vxlan_link_ops);
 	rcu_barrier();
 }
 module_exit(vxlan_cleanup_module);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue
  2013-06-05  7:29     ` Mike Rapoport
  2013-06-05  8:00       ` Cong Wang
@ 2013-06-05 15:41       ` Stephen Hemminger
  2013-06-08  8:23         ` Mike Rapoport
  1 sibling, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05 15:41 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Cong Wang, netdev

On Wed, 5 Jun 2013 10:29:10 +0300
Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:

> On Wed, Jun 5, 2013 at 9:47 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
> >> Do join/leave from work queue to avoid lock inversion problems
> >> between normal socket and RTNL. The code comes out cleaner
> >> as well.
> >>
> >> Uses Cong Wang's suggestion to turn refcnt into a real atomic
> >> since now need to handle case where last use of socket is IGMP
> >> worker.
> >>
> >> Also fixes race where vxlan_stop could be called after
> >> device was deleted on module removal. The call to rtnl_link_unregister
> >> would call dellink while vxlan device was still up. Reordering
> >> the calls fixes it.
> >>
> >
> > After the first 3 patches applied, I got:
> >
> > [   55.010954] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> > [   55.013309] CPU: 1 PID: 163 Comm: kworker/1:2 Not tainted
> > 3.10.0-rc2+ #1150
> > [   55.013309] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> > [   55.013309] Workqueue: events vxlan_igmp_work
> 
> I think the problem happens because vxlan_dellink does
> unregister_netdevice_queue and then immediately calls
> vxlan_sock_release and thus vs_sock is released before igmp_work
> starts

This is handled because a refcount is acquired before the igmp_work
is scheduled.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue
  2013-06-05  6:47   ` Cong Wang
  2013-06-05  7:29     ` Mike Rapoport
@ 2013-06-05 15:42     ` Stephen Hemminger
  2013-06-06  0:49       ` Cong Wang
  1 sibling, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-05 15:42 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Wed, 5 Jun 2013 06:47:52 +0000 (UTC)
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > Do join/leave from work queue to avoid lock inversion problems
> > between normal socket and RTNL. The code comes out cleaner
> > as well.
> >
> > Uses Cong Wang's suggestion to turn refcnt into a real atomic
> > since now need to handle case where last use of socket is IGMP
> > worker.
> >
> > Also fixes race where vxlan_stop could be called after
> > device was deleted on module removal. The call to rtnl_link_unregister
> > would call dellink while vxlan device was still up. Reordering
> > the calls fixes it.
> >
> 
> After the first 3 patches applied, I got:
> 
> [   55.010954] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> [   55.013309] CPU: 1 PID: 163 Comm: kworker/1:2 Not tainted
> 3.10.0-rc2+ #1150
> [   55.013309] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [   55.013309] Workqueue: events vxlan_igmp_work
> [   55.013309] task: ffff880070eac900 ti: ffff8800716d4000 task.ti:
> ffff8800716d4000
> [   55.013309] RIP: 0010:[<ffffffff815df704>]  [<ffffffff815df704>]
> vxlan_sock_release+0x25/0x55
> [   55.013309] RSP: 0018:ffff8800716d5cf8  EFLAGS: 00010246
> [   55.013309] RAX: 0000000000000000 RBX: ffff88006fd7c000 RCX:
> 0000000ccee84d06
> [   55.013309] RDX: dead000000200200 RSI: ffff880070ead048 RDI:
> ffff88006fd7c070
> [   55.013309] RBP: ffff8800716d5d08 R08: 0000000000000000 R09:
> ffff8800716d5c48
> [   55.013309] R10: 000000000000b6c4 R11: 000000000000b163 R12:
> ffff88006ebb7400
> [   55.013309] R13: ffff88006fd7c000 R14: ffff8800723b4520 R15:
> 0000000000000000
> [   55.013309] FS:  0000000000000000(0000) GS:ffff88007f800000(0000)
> knlGS:0000000000000000
> [   55.013309] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   55.013309] CR2: 00007fa99dc0a000 CR3: 0000000070c17000 CR4:
> 00000000000006e0
> [   55.013309] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [   55.013309] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [   55.013309] Stack:
> [   55.013309]  ffff880070eac900 ffff880070fc0b58 ffff8800716d5d48
> ffffffff815e0a73
> [   55.013309]  080808e0716d5d28 0000000f00000000 ffff8800714ef008
> ffff880070fc0b58
> [   55.013309]  ffff88007f9d6200 ffff88007f9d38c0 ffff8800716d5de8
> ffffffff8106872f
> [   55.013309] Call Trace:
> [   55.013309]  [<ffffffff815e0a73>] vxlan_igmp_work+0xa8/0xcf
> [   55.013309]  [<ffffffff8106872f>] process_one_work+0x240/0x408
> [   55.013309]  [<ffffffff81068662>] ? process_one_work+0x173/0x408
> [   55.013309]  [<ffffffff81068c82>] worker_thread+0x15d/0x1f1
> [   55.013309]  [<ffffffff81068b25>] ? rescuer_thread+0x1ff/0x1ff
> [   55.013309]  [<ffffffff81070c19>] kthread+0xb1/0xb9
> [   55.013309]  [<ffffffff81070b68>] ? freezing+0x17/0x17
> [   55.013309]  [<ffffffff81976bdc>] ret_from_fork+0x7c/0xb0
> [   55.013309]  [<ffffffff81070b68>] ? freezing+0x17/0x17
> 

What was the order of events (ie. what did you do to make it happen)?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 02/10] vxlan: handle skb_clone failure
  2013-06-05 14:05     ` David Stevens
@ 2013-06-06  0:47       ` Cong Wang
  2013-06-06  1:31         ` Stephen Hemminger
  0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2013-06-06  0:47 UTC (permalink / raw)
  To: David Stevens; +Cc: Linux Kernel Network Developers, netdev-owner

On Wed, Jun 5, 2013 at 10:05 PM, David Stevens <dlstevens@us.ibm.com> wrote:
> netdev-owner@vger.kernel.org wrote on 06/05/2013 02:59:56 AM:
>
>> From: Cong Wang <xiyou.wangcong@gmail.com>
>> If OOM, shouldn't we exit immediately instead of continue handle the
>> next one?
>
> It could be a temporary condition; trying again doesn't really hurt,
> in case some do go through. So, I prefer this, but could live with
> bailing for all destinations, too.
>

The problem we don't know if it is temporary or not, unless mm provides
EAGAIN?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue
  2013-06-05 15:42     ` Stephen Hemminger
@ 2013-06-06  0:49       ` Cong Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Cong Wang @ 2013-06-06  0:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Linux Kernel Network Developers

On Wed, Jun 5, 2013 at 11:42 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> What was the order of events (ie. what did you do to make it happen)?

1. generate some traffic via vxlan0 with ping
2. at the same time delete vxlan0

this is 100% reproducible.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] vxlan: fix crash on module removal
  2013-06-05 15:37   ` [PATCH net] vxlan: fix crash on module removal Stephen Hemminger
@ 2013-06-06  1:11     ` Cong Wang
  2013-06-06  1:32       ` Stephen Hemminger
  0 siblings, 1 reply; 30+ messages in thread
From: Cong Wang @ 2013-06-06  1:11 UTC (permalink / raw)
  To: netdev

On Wed, 05 Jun 2013 at 15:37 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
> If vxlan is removed with active vxlan's it would crash because
> rtnl_link_unregister (which calls vxlan_dellink), was invoked
> before unregister_pernet_device (which calls vxlan_stop).
>

Two points:

1. There are still many others drivers calling them in such order...
2. The removal should be in a reverse order of creation, at least in
theory.

I mean this *might* indicate something wrong.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 02/10] vxlan: handle skb_clone failure
  2013-06-06  0:47       ` Cong Wang
@ 2013-06-06  1:31         ` Stephen Hemminger
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-06  1:31 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Stevens, Linux Kernel Network Developers, netdev-owner

On Thu, 6 Jun 2013 08:47:49 +0800
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Wed, Jun 5, 2013 at 10:05 PM, David Stevens <dlstevens@us.ibm.com> wrote:
> > netdev-owner@vger.kernel.org wrote on 06/05/2013 02:59:56 AM:
> >
> >> From: Cong Wang <xiyou.wangcong@gmail.com>
> >> If OOM, shouldn't we exit immediately instead of continue handle the
> >> next one?
> >
> > It could be a temporary condition; trying again doesn't really hurt,
> > in case some do go through. So, I prefer this, but could live with
> > bailing for all destinations, too.
> >
> 
> The problem we don't know if it is temporary or not, unless mm provides
> EAGAIN?
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

There is no EAGAIN in the network transmit path, it is not called from
user context.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net] vxlan: fix crash on module removal
  2013-06-06  1:11     ` Cong Wang
@ 2013-06-06  1:32       ` Stephen Hemminger
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2013-06-06  1:32 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Thu, 6 Jun 2013 01:11:11 +0000 (UTC)
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> On Wed, 05 Jun 2013 at 15:37 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
> > If vxlan is removed with active vxlan's it would crash because
> > rtnl_link_unregister (which calls vxlan_dellink), was invoked
> > before unregister_pernet_device (which calls vxlan_stop).
> >
> 
> Two points:
> 
> 1. There are still many others drivers calling them in such order...
> 2. The removal should be in a reverse order of creation, at least in
> theory.
> 
> I mean this *might* indicate something wrong.

1. Other drivers either don't have the dependency or are broken.
   It makes no sense to stop a device that is deleted.

2. The normal creation order is
   vxlan_setup
   vxlan_newlink
   vxlan_open

     

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries
  2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
                   ` (9 preceding siblings ...)
  2013-06-05  6:23 ` [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Cong Wang
@ 2013-06-06 23:16 ` David Miller
  10 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2013-06-06 23:16 UTC (permalink / raw)
  To: stephen; +Cc: netdev


Stephen please resolve Cong Wang's crashes and then take care of the "return
bool" and typo nit picks while you're at it, and I'll apply this series it
looks good otherwise.

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue
  2013-06-05 15:41       ` Stephen Hemminger
@ 2013-06-08  8:23         ` Mike Rapoport
  0 siblings, 0 replies; 30+ messages in thread
From: Mike Rapoport @ 2013-06-08  8:23 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Cong Wang, netdev

On Wed, Jun 5, 2013 at 6:41 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Wed, 5 Jun 2013 10:29:10 +0300
> Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>
>> On Wed, Jun 5, 2013 at 9:47 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> > On Wed, 05 Jun 2013 at 04:24 GMT, Stephen Hemminger <stephen@networkplumber.org> wrote:
>> >> Do join/leave from work queue to avoid lock inversion problems
>> >> between normal socket and RTNL. The code comes out cleaner
>> >> as well.
>> >>
>> >> Uses Cong Wang's suggestion to turn refcnt into a real atomic
>> >> since now need to handle case where last use of socket is IGMP
>> >> worker.
>> >>
>> >> Also fixes race where vxlan_stop could be called after
>> >> device was deleted on module removal. The call to rtnl_link_unregister
>> >> would call dellink while vxlan device was still up. Reordering
>> >> the calls fixes it.
>> >>
>> >
>> > After the first 3 patches applied, I got:
>> >
>> > [   55.010954] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
>> > [   55.013309] CPU: 1 PID: 163 Comm: kworker/1:2 Not tainted
>> > 3.10.0-rc2+ #1150
>> > [   55.013309] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
>> > [   55.013309] Workqueue: events vxlan_igmp_work
>>
>> I think the problem happens because vxlan_dellink does
>> unregister_netdevice_queue and then immediately calls
>> vxlan_sock_release and thus vs_sock is released before igmp_work
>> starts
>
> This is handled because a refcount is acquired before the igmp_work
> is scheduled.

As far as I can tell, the vxlan_sock_release in vxlan_dellink call
occurs before vxlan_stop is invoked, and therefore before refcount is
acquired for igmp_work.
When I tried to use unregister_netdevice_queue(dev, NULL) I haven't
seen crashes.


--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-06-08  8:23 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-05  4:24 [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Stephen Hemminger
2013-06-05  4:24 ` [PATCH net-next 02/10] vxlan: handle skb_clone failure Stephen Hemminger
2013-06-05  6:59   ` Cong Wang
2013-06-05 14:05     ` David Stevens
2013-06-06  0:47       ` Cong Wang
2013-06-06  1:31         ` Stephen Hemminger
2013-06-05 12:50   ` David Stevens
2013-06-05  4:24 ` [PATCH net-next 03/10] vxlan: move IGMP join/leave to work queue Stephen Hemminger
2013-06-05  6:47   ` Cong Wang
2013-06-05  7:29     ` Mike Rapoport
2013-06-05  8:00       ` Cong Wang
2013-06-05 15:41       ` Stephen Hemminger
2013-06-08  8:23         ` Mike Rapoport
2013-06-05 15:42     ` Stephen Hemminger
2013-06-06  0:49       ` Cong Wang
2013-06-05 15:37   ` [PATCH net] vxlan: fix crash on module removal Stephen Hemminger
2013-06-06  1:11     ` Cong Wang
2013-06-06  1:32       ` Stephen Hemminger
2013-06-05  4:24 ` [PATCH net-next 04/10] vxlan: send notification when MAC migrates Stephen Hemminger
2013-06-05  4:24 ` [PATCH net-next 05/10] vxlan: make vxlan_xmit_one void Stephen Hemminger
2013-06-05 12:54   ` David Stevens
2013-06-05  4:24 ` [PATCH net-next 06/10] vxlan: convert remotes list to list_rcu Stephen Hemminger
2013-06-05  4:24 ` [PATCH net-next 07/10] vxlan: port module param should be ushort Stephen Hemminger
2013-06-05 13:35   ` Sergei Shtylyov
2013-06-05  4:24 ` [PATCH net-next 08/10] vxlan: use initializer for dummy structures Stephen Hemminger
2013-06-05  4:24 ` [PATCH net-next 09/10] vxlan: whitespace cleanup Stephen Hemminger
2013-06-05 12:55   ` David Stevens
2013-06-05  4:24 ` [PATCH net-next 10/10] vxlan: version 0.2 Stephen Hemminger
2013-06-05  6:23 ` [PATCH net-next 01/10] vxlan: only migrate dynamic FDB entries Cong Wang
2013-06-06 23:16 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.