All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v4 0/2] vxlan: allow specifying multiple default destinations
@ 2013-06-23 16:22 Mike Rapoport
  2013-06-23 16:22 ` [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append Mike Rapoport
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Mike Rapoport @ 2013-06-23 16:22 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David Stevens, Thomas Graf, Mike Rapoport

Hi,

Apparently, in my first send attempt I've misspeled netdev mailing address.
Sorry for the noise.

These patches add ability to specify multiple default destinations to
vxlan. This ability is usefull in cases when multicast are disabled on
infrastructure level, for instance in public clouds.

v4 changes:
* rebased on Stephen Hemminger vxlan-next tree
* updates to netlink interface according to Thomas Graf comments

v3 changes:
* make netlink interface for remote destinations management more
  generic and extend vxlan_validate to handle newly added atttributes.
  (as proposed by Thomas Graf)

v2 changes:
* rebased on current net-next
* flush default destinations list at dellink as per Atzm Watanabe comment
* support only IPv4


Mike Rapoport (2):
  vxlan: introduce vxlan_rdst_append
  vxlan: allow specifying multiple default destinations

 drivers/net/vxlan.c          | 282 +++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/if_link.h |  17 +++
 2 files changed, 286 insertions(+), 13 deletions(-)

-- 
1.8.1.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append
  2013-06-23 16:22 [PATCH net-next v4 0/2] vxlan: allow specifying multiple default destinations Mike Rapoport
@ 2013-06-23 16:22 ` Mike Rapoport
  2013-06-24  6:02   ` Cong Wang
  2013-06-23 16:22 ` [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Mike Rapoport @ 2013-06-23 16:22 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David Stevens, Thomas Graf, Mike Rapoport

to allow remotes list management for both FDB entries and default
destinations

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
---
 drivers/net/vxlan.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 212a256..e5fb6568 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -386,14 +386,13 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
 	return f;
 }
 
-/* Add/update destinations for multicast */
-static int vxlan_fdb_append(struct vxlan_fdb *f,
+static int vxlan_rdst_append(struct list_head *remotes,
 			    __be32 ip, __be16 port, __u32 vni, __u32 ifindex)
 {
 	struct vxlan_rdst *rd;
 
 	/* protected by vxlan->hash_lock */
-	list_for_each_entry(rd, &f->remotes, list) {
+	list_for_each_entry(rd, remotes, list) {
 		if (rd->remote_ip == ip &&
 		    rd->remote_port == port &&
 		    rd->remote_vni == vni &&
@@ -409,11 +408,18 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
 	rd->remote_vni = vni;
 	rd->remote_ifindex = ifindex;
 
-	list_add_tail_rcu(&rd->list, &f->remotes);
+	list_add_tail_rcu(&rd->list, remotes);
 
 	return 1;
 }
 
+/* Add/update destinations for multicast */
+static int vxlan_fdb_append(struct vxlan_fdb *f,
+			    __be32 ip, __be16 port, __u32 vni, __u32 ifindex)
+{
+	return vxlan_rdst_append(&f->remotes, ip, port, vni, ifindex);
+}
+
 /* Add new entry to forwarding table -- assumes lock held */
 static int vxlan_fdb_create(struct vxlan_dev *vxlan,
 			    const u8 *mac, __be32 ip,
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-23 16:22 [PATCH net-next v4 0/2] vxlan: allow specifying multiple default destinations Mike Rapoport
  2013-06-23 16:22 ` [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append Mike Rapoport
@ 2013-06-23 16:22 ` Mike Rapoport
  2013-06-24  0:14   ` Stephen Hemminger
  2013-06-24  6:48   ` Cong Wang
  2013-06-23 16:24 ` [PATCH iproute2 1/2] vxlan: introduce vxlan_parse_opt_create Mike Rapoport
  2013-06-23 16:24 ` [PATCH iproute2 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
  3 siblings, 2 replies; 19+ messages in thread
From: Mike Rapoport @ 2013-06-23 16:22 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David Stevens, Thomas Graf, Mike Rapoport

A list of multiple default destinations can be used in environments that
disable multicast on the infrastructure level, e.g. public clouds.

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
---
 drivers/net/vxlan.c          | 268 +++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/if_link.h |  17 +++
 2 files changed, 276 insertions(+), 9 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index e5fb6568..f57a0d94 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -103,6 +103,7 @@ struct vxlan_rdst {
 	u32			 remote_vni;
 	u32			 remote_ifindex;
 	struct list_head	 list;
+	struct rcu_head		 rcu;
 };
 
 /* Forwarding table entry */
@@ -141,6 +142,9 @@ struct vxlan_dev {
 	unsigned int	  addrcnt;
 	unsigned int	  addrmax;
 
+	struct list_head  remotes;     /* additional default destinations */
+	unsigned int	  remotes_cnt;
+
 	struct hlist_head fdb_head[FDB_HASH_SIZE];
 };
 
@@ -671,6 +675,105 @@ static bool vxlan_snoop(struct net_device *dev,
 	return false;
 }
 
+/* Add remote to default destinations list */
+static int vxlan_remote_add(struct vxlan_dev *vxlan, struct nlattr *attr)
+{
+	struct nlattr *i;
+	__be32 ip = htonl(INADDR_NONE);
+	__be16 port;
+	u32 ifindex, vni;
+	int rem, err;
+
+	port = vxlan->dst_port;
+	vni = vxlan->default_dst.remote_vni;
+	ifindex = vxlan->default_dst.remote_ifindex;
+
+	nla_for_each_nested(i, attr, rem) {
+		switch (nla_type(i)) {
+		case IFLA_VXLAN_REMOTE_ADDR:
+			ip = nla_get_be32(i);
+			break;
+		case IFLA_VXLAN_REMOTE_PORT:
+			port = nla_get_be16(i);
+			break;
+		case IFLA_VXLAN_REMOTE_VNI:
+			vni = nla_get_u32(i);
+			break;
+		case IFLA_VXLAN_REMOTE_IFINDEX:
+			ifindex = nla_get_u32(i);
+			break;
+		default:
+			break;
+		};
+	}
+
+	if (ip == htonl(INADDR_NONE))
+		return -EINVAL;
+
+	spin_lock_bh(&vxlan->hash_lock);
+	err = vxlan_rdst_append(&vxlan->remotes, ip, port, vni, ifindex);
+	spin_unlock_bh(&vxlan->hash_lock);
+
+	if (err < 0)
+		return err;
+
+	if (err == 0)
+		return -EEXIST;
+
+	vxlan->remotes_cnt++;
+
+	return 0;
+}
+
+static void vxlan_remote_free(struct rcu_head *head)
+{
+	struct vxlan_rdst *rd = container_of(head, struct vxlan_rdst, rcu);
+	kfree(rd);
+}
+
+static void vxlan_remote_destroy(struct vxlan_dev *vxlan,
+				 struct vxlan_rdst *rd)
+{
+	vxlan->remotes_cnt--;
+	list_del_rcu(&rd->list);
+	call_rcu(&rd->rcu, vxlan_remote_free);
+}
+
+/* Delete remote from default destinations list */
+static int vxlan_remote_delete(struct vxlan_dev *vxlan, struct nlattr *attr)
+{
+	struct vxlan_rdst *rd;
+	struct nlattr *i;
+	__be32 ip = htonl(INADDR_NONE);
+	int rem, err;
+
+	nla_for_each_nested(i, attr, rem) {
+		switch (nla_type(i)) {
+		case IFLA_VXLAN_REMOTE_ADDR:
+			ip = nla_get_be32(i);
+			break;
+		default:
+			break;
+		}
+	}
+
+	if (ip == htonl(INADDR_NONE) || ip == vxlan->default_dst.remote_ip)
+		return -EINVAL;
+
+	err = -ENOENT;
+
+	spin_lock_bh(&vxlan->hash_lock);
+	list_for_each_entry_rcu(rd, &vxlan->remotes, list) {
+		if (rd->remote_ip == ip) {
+			vxlan_remote_destroy(vxlan, rd);
+			err = 0;
+			break;
+		}
+	}
+	spin_unlock_bh(&vxlan->hash_lock);
+
+	return err;
+}
 
 /* See if multicast group is already in use by other ID */
 static bool vxlan_group_used(struct vxlan_net *vn, __be32 remote_ip)
@@ -1159,6 +1262,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 	bool did_rsc = false;
 	struct vxlan_rdst *rdst0, *rdst;
 	struct vxlan_fdb *f;
+	struct list_head *remotes;
 
 	skb_reset_mac_header(skb);
 	eth = eth_hdr(skb);
@@ -1183,20 +1287,22 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
 		    (vxlan->flags & VXLAN_F_L2MISS) &&
 		    !is_multicast_ether_addr(eth->h_dest))
 			vxlan_fdb_miss(vxlan, eth->h_dest);
+
+		remotes = &vxlan->remotes;
 	} else {
-		rdst = rdst0 = first_remote(f);
+		remotes = &f->remotes;
+	}
 
-		/* if there are multiple destinations, send copies */
-		list_for_each_entry_continue_rcu(rdst, &f->remotes, list) {
-			struct sk_buff *skb1;
+	/* if there are multiple destinations, send copies */
+	list_for_each_entry_rcu(rdst, remotes, list) {
+		struct sk_buff *skb1;
 
-			skb1 = skb_clone(skb, GFP_ATOMIC);
-			if (skb1)
-				vxlan_xmit_one(skb1, dev, rdst, did_rsc);
-		}
+		skb1 = skb_clone(skb, GFP_ATOMIC);
+		if (skb1)
+			vxlan_xmit_one(skb1, dev, rdst, did_rsc);
 	}
 
-	vxlan_xmit_one(skb, dev, rdst0, did_rsc);
+	dev_kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
 
@@ -1389,6 +1495,7 @@ static void vxlan_setup(struct net_device *dev)
 	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
 
 	INIT_LIST_HEAD(&vxlan->next);
+	INIT_LIST_HEAD(&vxlan->remotes);
 	spin_lock_init(&vxlan->hash_lock);
 	INIT_WORK(&vxlan->igmp_work, vxlan_igmp_work);
 	INIT_WORK(&vxlan->sock_work, vxlan_sock_work);
@@ -1408,6 +1515,13 @@ static void vxlan_setup(struct net_device *dev)
 		INIT_HLIST_HEAD(&vxlan->fdb_head[h]);
 }
 
+static const struct nla_policy vxlan_remotes_policy[IFLA_VXLAN_REMOTE_MAX + 1] = {
+	[IFLA_VXLAN_REMOTE_ADDR]	= { .type = NLA_U32 },
+	[IFLA_VXLAN_REMOTE_IFINDEX]	= { .type = NLA_U32 },
+	[IFLA_VXLAN_REMOTE_PORT]	= { .type = NLA_U16 },
+	[IFLA_VXLAN_REMOTE_VNI]		= { .type = NLA_U32 },
+};
+
 static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_ID]		= { .type = NLA_U32 },
 	[IFLA_VXLAN_GROUP]	= { .len = FIELD_SIZEOF(struct iphdr, daddr) },
@@ -1424,10 +1538,35 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_L2MISS]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_L3MISS]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_PORT]	= { .type = NLA_U16 },
+	[IFLA_VXLAN_REMOTES]	= { .type = NLA_NESTED },
 };
 
+static int vxlan_validate_remotes(struct nlattr *data)
+{
+	struct nlattr *attr;
+	int rem, err;
+
+	if (!data)
+		return 0;
+
+	nla_for_each_nested(attr, data, rem) {
+		if ((nla_type(attr) != IFLA_VXLAN_REMOTE_NEW) &&
+		    (nla_type(attr) != IFLA_VXLAN_REMOTE_DEL))
+			return -EINVAL;
+
+		err = nla_validate_nested(attr, IFLA_VXLAN_REMOTE_MAX,
+					  vxlan_remotes_policy);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
 {
+	int err;
+
 	if (tb[IFLA_ADDRESS]) {
 		if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) {
 			pr_debug("invalid link address (not ethernet)\n");
@@ -1460,6 +1599,10 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
 		}
 	}
 
+	err = vxlan_validate_remotes(data[IFLA_VXLAN_REMOTES]);
+	if (err)
+		return err;
+
 	return 0;
 }
 
@@ -1668,19 +1811,81 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 		return err;
 
 	list_add(&vxlan->next, &vn->vxlan_list);
+	list_add_tail_rcu(&vxlan->default_dst.list, &vxlan->remotes);
 
 	return 0;
 }
 
+static int vxlan_remotes_update(struct vxlan_dev *vxlan, struct nlattr *attr)
+{
+	struct nlattr *i;
+	int rem, err = 0;
+
+	nla_for_each_nested(i, attr, rem) {
+		switch (nla_type(i)) {
+		case IFLA_VXLAN_REMOTE_NEW:
+			err = vxlan_remote_add(vxlan, i);
+			break;
+		case IFLA_VXLAN_REMOTE_DEL:
+			err = vxlan_remote_delete(vxlan, i);
+			break;
+		default:
+			err = -EOPNOTSUPP;
+			break;
+		};
+
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int vxlan_changelink(struct net_device *dev,
+			    struct nlattr *tb[], struct nlattr *data[])
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+	int err;
+
+	if (data[IFLA_VXLAN_REMOTES]) {
+		err = vxlan_remotes_update(vxlan, data[IFLA_VXLAN_REMOTES]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static void vxlan_remotes_flush(struct vxlan_dev *vxlan)
+{
+	struct vxlan_rdst *rd, *nd;
+
+	spin_lock_bh(&vxlan->hash_lock);
+	list_for_each_entry_safe(rd, nd, &vxlan->remotes, list)
+		vxlan_remote_destroy(vxlan, rd);
+	spin_unlock_bh(&vxlan->hash_lock);
+}
+
 static void vxlan_dellink(struct net_device *dev, struct list_head *head)
 {
 	struct vxlan_dev *vxlan = netdev_priv(dev);
 
+	vxlan_remotes_flush(vxlan);
 	hlist_del_rcu(&vxlan->hlist);
 	list_del(&vxlan->next);
 	unregister_netdevice_queue(dev, head);
 }
 
+static size_t vxlan_remote_list_size(const struct net_device *dev)
+{
+	struct vxlan_dev *vxlan = netdev_priv(dev);
+
+	return nla_total_size(sizeof(struct nlattr)) +	/* IFLA_VXLAN_REMOTES */
+		(nla_total_size(sizeof(struct nlattr)) +
+		 nla_total_size(sizeof(__be32)) +	/* IFLA_VXLAN_REMOTE_ADDR */
+		 0) * vxlan->remotes_cnt;
+}
+
 static size_t vxlan_get_size(const struct net_device *dev)
 {
 
@@ -1699,9 +1904,50 @@ static size_t vxlan_get_size(const struct net_device *dev)
 		nla_total_size(sizeof(__u32)) +	/* IFLA_VXLAN_LIMIT */
 		nla_total_size(sizeof(struct ifla_vxlan_port_range)) +
 		nla_total_size(sizeof(__be16))+ /* IFLA_VXLAN_PORT */
+		vxlan_remote_list_size(dev) +
 		0;
 }
 
+static int vxlan_fill_remotes_info(struct sk_buff *skb,
+				   const struct vxlan_dev *vxlan)
+{
+	struct vxlan_rdst *rd;
+	struct nlattr *nest, *rdst_nest;
+	__be32 ip;
+	int i = 1;
+
+	if (!vxlan->remotes_cnt)
+		return 0;
+
+	nest = nla_nest_start(skb, IFLA_VXLAN_REMOTES);
+	if (nest == NULL)
+		goto nla_put_failure;
+
+	list_for_each_entry_rcu(rd, &vxlan->remotes, list) {
+		ip = rd->remote_ip;
+
+		if (ip == vxlan->default_dst.remote_ip)
+			continue;
+
+		rdst_nest = nla_nest_start(skb, i);
+		if (rdst_nest == NULL)
+			goto nla_put_failure;
+
+		if (nla_put_be32(skb, IFLA_VXLAN_REMOTE_ADDR, ip))
+			goto nla_put_failure;
+
+		nla_nest_end(skb, rdst_nest);
+		i++;
+	}
+
+	nla_nest_end(skb, nest);
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 {
 	const struct vxlan_dev *vxlan = netdev_priv(dev);
@@ -1742,6 +1988,9 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
 	if (nla_put(skb, IFLA_VXLAN_PORT_RANGE, sizeof(ports), &ports))
 		goto nla_put_failure;
 
+	if (vxlan_fill_remotes_info(skb, vxlan))
+		goto nla_put_failure;
+
 	return 0;
 
 nla_put_failure:
@@ -1756,6 +2005,7 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = {
 	.setup		= vxlan_setup,
 	.validate	= vxlan_validate,
 	.newlink	= vxlan_newlink,
+	.changelink	= vxlan_changelink,
 	.dellink	= vxlan_dellink,
 	.get_size	= vxlan_get_size,
 	.fill_info	= vxlan_fill_info,
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 03f6170..6ef25c1 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -313,10 +313,27 @@ enum {
 	IFLA_VXLAN_L2MISS,
 	IFLA_VXLAN_L3MISS,
 	IFLA_VXLAN_PORT,	/* destination port */
+	IFLA_VXLAN_REMOTES,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
 
+enum {
+	IFLA_VXLAN_REMOTE_NEW = 1,
+	IFLA_VXLAN_REMOTE_DEL,
+};
+
+enum {
+	IFLA_VXLAN_REMOTE_UNSPEC,
+	IFLA_VXLAN_REMOTE_ADDR,
+	IFLA_VXLAN_REMOTE_IFINDEX,
+	IFLA_VXLAN_REMOTE_PORT,
+	IFLA_VXLAN_REMOTE_VNI,
+	__IFLA_VXLAN_REMOTE_MAX
+};
+
+#define IFLA_VXLAN_REMOTE_MAX	(__IFLA_VXLAN_REMOTE_MAX - 1)
+
 struct ifla_vxlan_port_range {
 	__be16	low;
 	__be16	high;
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH iproute2 1/2] vxlan: introduce vxlan_parse_opt_create
  2013-06-23 16:22 [PATCH net-next v4 0/2] vxlan: allow specifying multiple default destinations Mike Rapoport
  2013-06-23 16:22 ` [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append Mike Rapoport
  2013-06-23 16:22 ` [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
@ 2013-06-23 16:24 ` Mike Rapoport
  2013-06-23 16:24 ` [PATCH iproute2 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
  3 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2013-06-23 16:24 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David Stevens, Thomas Graf, Mike Rapoport

for addition of vxlan_parse_opt_change, which is required to
differentiate between create time only settings and the settings that
can be changed with 'ip link set'

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
---
 ip/iplink_vxlan.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 1025326..92ddfb4 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -35,8 +35,8 @@ static void explain(void)
 	fprintf(stderr, "       TTL  := { 1..255 | inherit }\n");
 }
 
-static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
-			  struct nlmsghdr *n)
+static int vxlan_parse_opt_create(struct link_util *lu, int argc, char **argv,
+				  struct nlmsghdr *n)
 {
 	__u32 vni = 0;
 	int vni_set = 0;
@@ -187,6 +187,12 @@ static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 	return 0;
 }
 
+static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
+			  struct nlmsghdr *n)
+{
+	return vxlan_parse_opt_create(lu, argc, argv, n);
+}
+
 static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 {
 	__u32 vni;
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH iproute2 2/2] vxlan: allow specifying multiple default destinations
  2013-06-23 16:22 [PATCH net-next v4 0/2] vxlan: allow specifying multiple default destinations Mike Rapoport
                   ` (2 preceding siblings ...)
  2013-06-23 16:24 ` [PATCH iproute2 1/2] vxlan: introduce vxlan_parse_opt_create Mike Rapoport
@ 2013-06-23 16:24 ` Mike Rapoport
  2013-06-24  0:32   ` Cong Wang
  3 siblings, 1 reply; 19+ messages in thread
From: Mike Rapoport @ 2013-06-23 16:24 UTC (permalink / raw)
  To: netdev; +Cc: Stephen Hemminger, David Stevens, Thomas Graf, Mike Rapoport

Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
---
 ip/iplink_vxlan.c | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 166 insertions(+), 1 deletion(-)

diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 92ddfb4..ba9a5d8 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -35,6 +35,130 @@ static void explain(void)
 	fprintf(stderr, "       TTL  := { 1..255 | inherit }\n");
 }
 
+static void explain_change(void)
+{
+	fprintf(stderr, "Usage: ... vxlan id VNI\n");
+	fprintf(stderr, "                 [ dstadd DST ]\n");
+	fprintf(stderr, "                 [ dstdel ADDR ]\n");
+	fprintf(stderr, "\n");
+	fprintf(stderr, "Where: VNI := 0-16777215\n");
++	fprintf(stderr, "       DST  := [ ADDR [port PORT] [vni VNI] [via DEV]]\n");
+}
+
+static int vxlan_parse_dstadd(int *argcp, char ***argvp, struct nlmsghdr *n)
+{
+	int argc = *argcp;
+	char **argv = *argvp;
+	__u32 vni, ifindex;
+	__u16 port;
+	struct rtattr *nest;
+	int addr_set = 0;
+
+	nest = addattr_nest(n, 1024, IFLA_VXLAN_REMOTE_NEW);
+
+	while (argc > 0) {
+		if (!matches(*argv, "vni")) {
+			NEXT_ARG();
+			if (get_u32(&vni, *argv, 0) ||
+			    vni >= 1u << 24)
+				invarg("invalid id", *argv);
+			addattr32(n, 1024, IFLA_VXLAN_REMOTE_VNI, vni);
+		} else if (!matches(*argv, "port")) {
+			NEXT_ARG();
+			if (get_u16(&port, *argv, 0))
+				invarg("port", *argv);
+			addattr32(n, 1024, IFLA_VXLAN_REMOTE_PORT, htons(port));
+		} else if (!matches(*argv, "via")) {
+			NEXT_ARG();
+			ifindex = if_nametoindex(*argv);
+			addattr32(n, 1024, IFLA_VXLAN_REMOTE_IFINDEX, ifindex);
+		} else {
+			inet_prefix addr;
+			get_prefix(&addr, *argv, AF_UNSPEC);
+			addattr_l(n, 1024, IFLA_VXLAN_REMOTE_ADDR,
+				  &addr.data, addr.bytelen);
+			addr_set = 1;
+		}
+		argc--, argv++;
+	}
+
+	if (!addr_set)
+		incomplete_command();
+
+	addattr_nest_end(n, nest);
+
+	*argcp = argc;
+	*argvp = argv;
+	return 0;
+}
+
+static int vxlan_parse_dstdel(int *argcp, char ***argvp, struct nlmsghdr *n)
+{
+	int argc = *argcp;
+	char **argv = *argvp;
+	struct rtattr *nest;
+
+	nest = addattr_nest(n, 1024, IFLA_VXLAN_REMOTE_DEL);
+
+	while (argc > 0) {
+		inet_prefix addr;
+		get_prefix(&addr, *argv, AF_UNSPEC);
+		addattr_l(n, 1024, IFLA_VXLAN_REMOTE_ADDR,
+			  &addr.data, addr.bytelen);
+		argc--, argv++;
+	}
+
+	if (argc == *argcp)
+		incomplete_command();
+
+	addattr_nest_end(n, nest);
+
+	*argcp = argc;
+	*argvp = argv;
+	return 0;
+}
+
+static int vxlan_parse_opt_change(struct link_util *lu, int argc, char **argv,
+				  struct nlmsghdr *n)
+{
+	__u32 vni = 0;
+	int vni_set = 0;
+	struct rtattr *remotes;
+
+	while (argc > 0) {
+		if (!matches(*argv, "id") ||
+		    !matches(*argv, "vni")) {
+			NEXT_ARG();
+			if (get_u32(&vni, *argv, 0) ||
+			    vni >= 1u << 24)
+				invarg("invalid id", *argv);
+			vni_set = 1;
+		} else if (!matches(*argv, "dstadd")) {
+			NEXT_ARG();
+			remotes = addattr_nest(n, 1024, IFLA_VXLAN_REMOTES);
+			vxlan_parse_dstadd(&argc, &argv, n);
+			addattr_nest_end(n, remotes);
+		} else if (!matches(*argv, "dstdel")) {
+			NEXT_ARG();
+			remotes = addattr_nest(n, 1024, IFLA_VXLAN_REMOTES);
+			vxlan_parse_dstdel(&argc, &argv, n);
+			addattr_nest_end(n, remotes);
+		} else {
+			fprintf(stderr, "vxlan: unknown command \"%s\"?\n", *argv);
+			explain_change();
+			return -1;
+		}
+		argc--, argv++;
+	}
+
+	if (!vni_set) {
+		fprintf(stderr, "vxlan: missing virtual network identifier\n");
+		return -1;
+	}
+
+	return 0;
+}
+
 static int vxlan_parse_opt_create(struct link_util *lu, int argc, char **argv,
 				  struct nlmsghdr *n)
 {
@@ -190,7 +314,45 @@ static int vxlan_parse_opt_create(struct link_util *lu, int argc, char **argv,
 static int vxlan_parse_opt(struct link_util *lu, int argc, char **argv,
 			  struct nlmsghdr *n)
 {
-	return vxlan_parse_opt_create(lu, argc, argv, n);
+	if (n->nlmsg_flags & NLM_F_CREATE)
+		return vxlan_parse_opt_create(lu, argc, argv, n);
+
+	return vxlan_parse_opt_change(lu, argc, argv, n);
+}
+
+static void vxlan_print_remote(FILE *f, struct rtattr *attr)
+{
+	struct rtattr *tb[IFLA_VXLAN_REMOTE_MAX];
+	char s1[1024];
+
+	parse_rtattr_nested(tb, IFLA_VXLAN_REMOTE_MAX, attr);
+
+	if (tb[IFLA_VXLAN_REMOTE_ADDR]) {
+		struct rtattr *i = tb[IFLA_VXLAN_REMOTE_ADDR];
+		if (RTA_PAYLOAD(i) >= sizeof(struct in6_addr)) {
+			struct in6_addr addr;
+			memcpy(&addr, RTA_DATA(i), sizeof(struct in6_addr));
+			fprintf(f, "        %s\n",
+				format_host(AF_INET6, sizeof(struct in6_addr),
+					    &addr, s1, sizeof(s1)));
+		} else if (RTA_PAYLOAD(i) >= sizeof(__be32)) {
+			__be32 addr = rta_getattr_u32(i);
+			fprintf(f, "        %s\n",
+				format_host(AF_INET, 4, &addr, s1, sizeof(s1)));
+		}
+	}
+}
+
+static void vxlan_print_remotes(FILE *f, struct rtattr *attr)
+{
+	struct rtattr *i;
+	int rem, n = 0;
+
+	fprintf(f, "\n      default destinations :\n");
+
+	rem = RTA_PAYLOAD(attr);
+	for (i = RTA_DATA(attr); RTA_OK(i, rem); i = RTA_NEXT(i, rem), n++)
+		vxlan_print_remote(f, i);
 }
 
 static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
@@ -283,6 +445,9 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 	if (tb[IFLA_VXLAN_LIMIT] &&
 	    (maxaddr = rta_getattr_u32(tb[IFLA_VXLAN_LIMIT]) != 0))
 		    fprintf(f, "maxaddr %u ", maxaddr);
+
+	if (tb[IFLA_VXLAN_REMOTES])
+		vxlan_print_remotes(f, tb[IFLA_VXLAN_REMOTES]);
 }
 
 struct link_util vxlan_link_util = {
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-23 16:22 ` [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
@ 2013-06-24  0:14   ` Stephen Hemminger
  2013-06-24  5:57     ` Mike Rapoport
  2013-06-24  6:48   ` Cong Wang
  1 sibling, 1 reply; 19+ messages in thread
From: Stephen Hemminger @ 2013-06-24  0:14 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: netdev, David Stevens, Thomas Graf

On Sun, 23 Jun 2013 19:22:23 +0300
Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:

> A list of multiple default destinations can be used in environments that
> disable multicast on the infrastructure level, e.g. public clouds.
> 
> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
> ---
>  drivers/net/vxlan.c          | 268 +++++++++++++++++++++++++++++++++++++++++--
>  include/uapi/linux/if_link.h |  17 +++
>  2 files changed, 276 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index e5fb6568..f57a0d94 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -103,6 +103,7 @@ struct vxlan_rdst {
>  	u32			 remote_vni;
>  	u32			 remote_ifindex;
>  	struct list_head	 list;
> +	struct rcu_head		 rcu;
>  };

The use of remotes_cnt here is not SMP safe.
You are using remotes_cnt to size the buffer for dumping, but then the list
of remotes might change during the dump.

There a a couple of alternatives here:
1. Put a hard limit on the number of remotes per MAC.
2. When there are multiple destnations, just dump multiple entries, like
   multipath routing does.

I prefer #2 because it also allows for a cleaner API on creation.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH iproute2 2/2] vxlan: allow specifying multiple default destinations
  2013-06-23 16:24 ` [PATCH iproute2 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
@ 2013-06-24  0:32   ` Cong Wang
  2013-06-24  5:40     ` Mike Rapoport
  0 siblings, 1 reply; 19+ messages in thread
From: Cong Wang @ 2013-06-24  0:32 UTC (permalink / raw)
  To: netdev

On Sun, 23 Jun 2013 at 16:24 GMT, Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
> +static void explain_change(void)
> +{
> +	fprintf(stderr, "Usage: ... vxlan id VNI\n");
> +	fprintf(stderr, "                 [ dstadd DST ]\n");
> +	fprintf(stderr, "                 [ dstdel ADDR ]\n");
> +	fprintf(stderr, "\n");
> +	fprintf(stderr, "Where: VNI := 0-16777215\n");
> ++	fprintf(stderr, "       DST  := [ ADDR [port PORT] [vni VNI] [via DEV]]\n");

This '++' line does't look correct, does it?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH iproute2 2/2] vxlan: allow specifying multiple default destinations
  2013-06-24  0:32   ` Cong Wang
@ 2013-06-24  5:40     ` Mike Rapoport
  0 siblings, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2013-06-24  5:40 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Mon, Jun 24, 2013 at 3:32 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Sun, 23 Jun 2013 at 16:24 GMT, Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>> +static void explain_change(void)
>> +{
>> +     fprintf(stderr, "Usage: ... vxlan id VNI\n");
>> +     fprintf(stderr, "                 [ dstadd DST ]\n");
>> +     fprintf(stderr, "                 [ dstdel ADDR ]\n");
>> +     fprintf(stderr, "\n");
>> +     fprintf(stderr, "Where: VNI := 0-16777215\n");
>> ++    fprintf(stderr, "       DST  := [ ADDR [port PORT] [vni VNI] [via DEV]]\n");
>
> This '++' line does't look correct, does it?

Indeed so,  thanks for catching.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-24  0:14   ` Stephen Hemminger
@ 2013-06-24  5:57     ` Mike Rapoport
  2013-06-24 15:35       ` Stephen Hemminger
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Rapoport @ 2013-06-24  5:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David Stevens, Thomas Graf

On Mon, Jun 24, 2013 at 3:14 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Sun, 23 Jun 2013 19:22:23 +0300
> Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>
>> A list of multiple default destinations can be used in environments that
>> disable multicast on the infrastructure level, e.g. public clouds.
>>
>> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
>> ---
>>  drivers/net/vxlan.c          | 268 +++++++++++++++++++++++++++++++++++++++++--
>>  include/uapi/linux/if_link.h |  17 +++
>>  2 files changed, 276 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> index e5fb6568..f57a0d94 100644
>> --- a/drivers/net/vxlan.c
>> +++ b/drivers/net/vxlan.c
>> @@ -103,6 +103,7 @@ struct vxlan_rdst {
>>       u32                      remote_vni;
>>       u32                      remote_ifindex;
>>       struct list_head         list;
>> +     struct rcu_head          rcu;
>>  };
>
> The use of remotes_cnt here is not SMP safe.
> You are using remotes_cnt to size the buffer for dumping, but then the list
> of remotes might change during the dump.

The remotes_cnt is used only in netlink callbacks with rtnl_lock held
and it cannot be modified otherwise, so I don't see why it is not SMP
safe.

> There a a couple of alternatives here:
> 1. Put a hard limit on the number of remotes per MAC.
> 2. When there are multiple destnations, just dump multiple entries, like
>    multipath routing does.
>
> I prefer #2 because it also allows for a cleaner API on creation.
>



--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append
  2013-06-23 16:22 ` [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append Mike Rapoport
@ 2013-06-24  6:02   ` Cong Wang
  2013-06-24  6:54     ` Mike Rapoport
  2013-06-24  7:21     ` Mike Rapoport
  0 siblings, 2 replies; 19+ messages in thread
From: Cong Wang @ 2013-06-24  6:02 UTC (permalink / raw)
  To: netdev

On Sun, 23 Jun 2013 at 16:22 GMT, Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
> to allow remotes list management for both FDB entries and default
> destinations
>
> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
> ---
>  drivers/net/vxlan.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 212a256..e5fb6568 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -386,14 +386,13 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
>  	return f;
>  }
>  
> -/* Add/update destinations for multicast */
> -static int vxlan_fdb_append(struct vxlan_fdb *f,
> +static int vxlan_rdst_append(struct list_head *remotes,
>  			    __be32 ip, __be16 port, __u32 vni, __u32 ifindex)


I think you need to align the above line.


>  {
>  	struct vxlan_rdst *rd;
>  
>  	/* protected by vxlan->hash_lock */
> -	list_for_each_entry(rd, &f->remotes, list) {
> +	list_for_each_entry(rd, remotes, list) {

This patch is based on Stephen's patches which are not yet merged into
net-next, right? If so, I think you have to wait. My IPv6 patches are
pending because of his patches too.


>  		if (rd->remote_ip == ip &&
>  		    rd->remote_port == port &&
>  		    rd->remote_vni == vni &&
> @@ -409,11 +408,18 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,
>  	rd->remote_vni = vni;
>  	rd->remote_ifindex = ifindex;
>  
> -	list_add_tail_rcu(&rd->list, &f->remotes);
> +	list_add_tail_rcu(&rd->list, remotes);
>  
>  	return 1;
>  }
>  
> +/* Add/update destinations for multicast */
> +static int vxlan_fdb_append(struct vxlan_fdb *f,
> +			    __be32 ip, __be16 port, __u32 vni, __u32 ifindex)

You need to align this line too.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-23 16:22 ` [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
  2013-06-24  0:14   ` Stephen Hemminger
@ 2013-06-24  6:48   ` Cong Wang
  2013-06-24  6:58     ` Mike Rapoport
  2013-06-24 15:18     ` Stephen Hemminger
  1 sibling, 2 replies; 19+ messages in thread
From: Cong Wang @ 2013-06-24  6:48 UTC (permalink / raw)
  To: netdev

On Sun, 23 Jun 2013 at 16:22 GMT, Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>  static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>  {
> +	int err;
> +
>  	if (tb[IFLA_ADDRESS]) {
>  		if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) {
>  			pr_debug("invalid link address (not ethernet)\n");
> @@ -1460,6 +1599,10 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>  		}
>  	}
>  
> +	err = vxlan_validate_remotes(data[IFLA_VXLAN_REMOTES]);
> +	if (err)
> +		return err;
> +
>  	return 0;
>  }


Seems you can simply return vxlan_validate_remotes(...); here.


> +static int vxlan_fill_remotes_info(struct sk_buff *skb,
> +				   const struct vxlan_dev *vxlan)
> +{
> +	struct vxlan_rdst *rd;
> +	struct nlattr *nest, *rdst_nest;
> +	__be32 ip;
> +	int i = 1;
> +
> +	if (!vxlan->remotes_cnt)
> +		return 0;
> +
> +	nest = nla_nest_start(skb, IFLA_VXLAN_REMOTES);
> +	if (nest == NULL)
> +		goto nla_put_failure;
> +
> +	list_for_each_entry_rcu(rd, &vxlan->remotes, list) {


Need RCU read lock here?


> +		ip = rd->remote_ip;
> +
> +		if (ip == vxlan->default_dst.remote_ip)
> +			continue;
> +
> +		rdst_nest = nla_nest_start(skb, i);
> +		if (rdst_nest == NULL)
> +			goto nla_put_failure;
> +
> +		if (nla_put_be32(skb, IFLA_VXLAN_REMOTE_ADDR, ip))
> +			goto nla_put_failure;
> +
> +		nla_nest_end(skb, rdst_nest);
> +		i++;
> +	}
> +
> +	nla_nest_end(skb, nest);
> +
> +	return 0;
> +
> +nla_put_failure:
> +	return -EMSGSIZE;
> +}
> +


Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append
  2013-06-24  6:02   ` Cong Wang
@ 2013-06-24  6:54     ` Mike Rapoport
  2013-06-24  8:28       ` David Miller
  2013-06-24  7:21     ` Mike Rapoport
  1 sibling, 1 reply; 19+ messages in thread
From: Mike Rapoport @ 2013-06-24  6:54 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Mon, Jun 24, 2013 at 9:02 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Sun, 23 Jun 2013 at 16:22 GMT, Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>> to allow remotes list management for both FDB entries and default
>> destinations
>>
>> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
>> ---
>>  drivers/net/vxlan.c | 14 ++++++++++----
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> index 212a256..e5fb6568 100644
>> --- a/drivers/net/vxlan.c
>> +++ b/drivers/net/vxlan.c
>> @@ -386,14 +386,13 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
>>       return f;
>>  }
>>
>> -/* Add/update destinations for multicast */
>> -static int vxlan_fdb_append(struct vxlan_fdb *f,
>> +static int vxlan_rdst_append(struct list_head *remotes,
>>                           __be32 ip, __be16 port, __u32 vni, __u32 ifindex)
>
>
> I think you need to align the above line.

I just kept the original indentation

>>
>> +/* Add/update destinations for multicast */
>> +static int vxlan_fdb_append(struct vxlan_fdb *f,
>> +                         __be32 ip, __be16 port, __u32 vni, __u32 ifindex)
>
> You need to align this line too.

And here as well.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-24  6:48   ` Cong Wang
@ 2013-06-24  6:58     ` Mike Rapoport
  2013-06-24 15:18     ` Stephen Hemminger
  1 sibling, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2013-06-24  6:58 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Mon, Jun 24, 2013 at 9:48 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Sun, 23 Jun 2013 at 16:22 GMT, Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>>  static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>>  {
>> +     int err;
>> +
>>       if (tb[IFLA_ADDRESS]) {
>>               if (nla_len(tb[IFLA_ADDRESS]) != ETH_ALEN) {
>>                       pr_debug("invalid link address (not ethernet)\n");
>> @@ -1460,6 +1599,10 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>>               }
>>       }
>>
>> +     err = vxlan_validate_remotes(data[IFLA_VXLAN_REMOTES]);
>> +     if (err)
>> +             return err;
>> +
>>       return 0;
>>  }
>
>
> Seems you can simply return vxlan_validate_remotes(...); here.

Yes, but I believe it's better looking this way.

>> +static int vxlan_fill_remotes_info(struct sk_buff *skb,
>> +                                const struct vxlan_dev *vxlan)
>> +{
>> +     struct vxlan_rdst *rd;
>> +     struct nlattr *nest, *rdst_nest;
>> +     __be32 ip;
>> +     int i = 1;
>> +
>> +     if (!vxlan->remotes_cnt)
>> +             return 0;
>> +
>> +     nest = nla_nest_start(skb, IFLA_VXLAN_REMOTES);
>> +     if (nest == NULL)
>> +             goto nla_put_failure;
>> +
>> +     list_for_each_entry_rcu(rd, &vxlan->remotes, list) {
>
>
> Need RCU read lock here?

Why? The remotes list can be modified only via netlink with rtnl_lock held...

>
>> +             ip = rd->remote_ip;
>> +
>> +             if (ip == vxlan->default_dst.remote_ip)
>> +                     continue;
>> +
>> +             rdst_nest = nla_nest_start(skb, i);
>> +             if (rdst_nest == NULL)
>> +                     goto nla_put_failure;
>> +
>> +             if (nla_put_be32(skb, IFLA_VXLAN_REMOTE_ADDR, ip))
>> +                     goto nla_put_failure;
>> +
>> +             nla_nest_end(skb, rdst_nest);
>> +             i++;
>> +     }
>> +
>> +     nla_nest_end(skb, nest);
>> +
>> +     return 0;
>> +
>> +nla_put_failure:
>> +     return -EMSGSIZE;
>> +}
>> +
>
>
> Thanks!
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append
  2013-06-24  6:02   ` Cong Wang
  2013-06-24  6:54     ` Mike Rapoport
@ 2013-06-24  7:21     ` Mike Rapoport
  1 sibling, 0 replies; 19+ messages in thread
From: Mike Rapoport @ 2013-06-24  7:21 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Mon, Jun 24, 2013 at 9:02 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Sun, 23 Jun 2013 at 16:22 GMT, Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>>  {
>>       struct vxlan_rdst *rd;
>>
>>       /* protected by vxlan->hash_lock */
>> -     list_for_each_entry(rd, &f->remotes, list) {
>> +     list_for_each_entry(rd, remotes, list) {
>
> This patch is based on Stephen's patches which are not yet merged into
> net-next, right? If so, I think you have to wait. My IPv6 patches are
> pending because of his patches too.

I know that my patches (as well as OVS pacthes) are getting in the way
of your IPv6 work. I've patiently waited at the beginning of the cycle
for you to merge the IPv6 patches. Afterwards I've waited for Stephen
and David to reach an agreement about the list implementation. So I
think I can send the patches for review without waiting, even although
vxlan-net has not been merged into the net-next yet.

--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append
  2013-06-24  6:54     ` Mike Rapoport
@ 2013-06-24  8:28       ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2013-06-24  8:28 UTC (permalink / raw)
  To: mike.rapoport; +Cc: xiyou.wangcong, netdev

From: Mike Rapoport <mike.rapoport@ravellosystems.com>
Date: Mon, 24 Jun 2013 09:54:51 +0300

> On Mon, Jun 24, 2013 at 9:02 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Sun, 23 Jun 2013 at 16:22 GMT, Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>>> to allow remotes list management for both FDB entries and default
>>> destinations
>>>
>>> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
>>> ---
>>>  drivers/net/vxlan.c | 14 ++++++++++----
>>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>>> index 212a256..e5fb6568 100644
>>> --- a/drivers/net/vxlan.c
>>> +++ b/drivers/net/vxlan.c
>>> @@ -386,14 +386,13 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan,
>>>       return f;
>>>  }
>>>
>>> -/* Add/update destinations for multicast */
>>> -static int vxlan_fdb_append(struct vxlan_fdb *f,
>>> +static int vxlan_rdst_append(struct list_head *remotes,
>>>                           __be32 ip, __be16 port, __u32 vni, __u32 ifindex)
>>
>>
>> I think you need to align the above line.
> 
> I just kept the original indentation

When you change the arguments to a function, you are absolutely obligated
to fix the indentation of any surrounding lines.

In the networking code, functions are to be styled such that the arguments
appearing on the second and subsequent lines are aligned perfectly to the
first column after the openning parenthesis of the first line.

Therefore, if you change where that openning parenthesis is, you must
reindent the subsequent lines.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-24  6:48   ` Cong Wang
  2013-06-24  6:58     ` Mike Rapoport
@ 2013-06-24 15:18     ` Stephen Hemminger
  1 sibling, 0 replies; 19+ messages in thread
From: Stephen Hemminger @ 2013-06-24 15:18 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

On Mon, 24 Jun 2013 06:48:17 +0000 (UTC)
Cong Wang <xiyou.wangcong@gmail.com> wrote:

> > +static int vxlan_fill_remotes_info(struct sk_buff *skb,
> > +				   const struct vxlan_dev *vxlan)
> > +{
> > +	struct vxlan_rdst *rd;
> > +	struct nlattr *nest, *rdst_nest;
> > +	__be32 ip;
> > +	int i = 1;
> > +
> > +	if (!vxlan->remotes_cnt)
> > +		return 0;
> > +
> > +	nest = nla_nest_start(skb, IFLA_VXLAN_REMOTES);
> > +	if (nest == NULL)
> > +		goto nla_put_failure;
> > +
> > +	list_for_each_entry_rcu(rd, &vxlan->remotes, list) {  
> 
> 
> Need RCU read lock here?

RCU is unnecessary here since already protected by RTNL.
 
rtnl_fill_ifinfo
   ASSERT_RTNL()
   rtnl_link_fill
      vxlan_fill_info
         vxlan_fill_remotes_info

Better just to remove the for_each_entry_rcu and use for_each_entry

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-24  5:57     ` Mike Rapoport
@ 2013-06-24 15:35       ` Stephen Hemminger
  2013-06-24 19:52         ` Mike Rapoport
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Hemminger @ 2013-06-24 15:35 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: netdev, David Stevens, Thomas Graf

On Mon, 24 Jun 2013 08:57:55 +0300
Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:

> On Mon, Jun 24, 2013 at 3:14 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Sun, 23 Jun 2013 19:22:23 +0300
> > Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
> >
> >> A list of multiple default destinations can be used in environments that
> >> disable multicast on the infrastructure level, e.g. public clouds.
> >>
> >> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
> >> ---
> >>  drivers/net/vxlan.c          | 268 +++++++++++++++++++++++++++++++++++++++++--
> >>  include/uapi/linux/if_link.h |  17 +++
> >>  2 files changed, 276 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> >> index e5fb6568..f57a0d94 100644
> >> --- a/drivers/net/vxlan.c
> >> +++ b/drivers/net/vxlan.c
> >> @@ -103,6 +103,7 @@ struct vxlan_rdst {
> >>       u32                      remote_vni;
> >>       u32                      remote_ifindex;
> >>       struct list_head         list;
> >> +     struct rcu_head          rcu;
> >>  };
> >
> > The use of remotes_cnt here is not SMP safe.
> > You are using remotes_cnt to size the buffer for dumping, but then the list
> > of remotes might change during the dump.
> 
> The remotes_cnt is used only in netlink callbacks with rtnl_lock held
> and it cannot be modified otherwise, so I don't see why it is not SMP
> safe.
> 
> > There a a couple of alternatives here:
> > 1. Put a hard limit on the number of remotes per MAC.
> > 2. When there are multiple destnations, just dump multiple entries, like
> >    multipath routing does.
> >
> > I prefer #2 because it also allows for a cleaner API on creation.
> >
>

After a few more hours of review, I think the API still needs more work.
The API uses attributes IFLA_VXLAN_REMOTE_NEW and IFLA_VXLAN_REMOTE_DEL to
implement adding and deleting entries. This is contrary to other uses of attributes
in Linux netlink. The convention is that attributes are are descriptors of objects
not verbs. The attributes are reported and used on creation.

The API needs to use the netlink message flags to indicate create, replace and delete
instead. It may mean changes to net/core/rtnetlink.c. I would rather see VXLAN follow
convention as close as possible.

Sorry for being so difficult but once an API is done, it has a long lifetime and other
stuff tends to follow it. I know from experience having made the mistake far
to often..

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-24 15:35       ` Stephen Hemminger
@ 2013-06-24 19:52         ` Mike Rapoport
  2013-06-24 20:24           ` Stephen Hemminger
  0 siblings, 1 reply; 19+ messages in thread
From: Mike Rapoport @ 2013-06-24 19:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, David Stevens, Thomas Graf

On Mon, Jun 24, 2013 at 6:35 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Mon, 24 Jun 2013 08:57:55 +0300
> Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>
>> On Mon, Jun 24, 2013 at 3:14 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>> > On Sun, 23 Jun 2013 19:22:23 +0300
>> > Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
>> >
>> >> A list of multiple default destinations can be used in environments that
>> >> disable multicast on the infrastructure level, e.g. public clouds.
>> >>
>> >> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
>> >> ---
>> >>  drivers/net/vxlan.c          | 268 +++++++++++++++++++++++++++++++++++++++++--
>> >>  include/uapi/linux/if_link.h |  17 +++
>> >>  2 files changed, 276 insertions(+), 9 deletions(-)
>> >>
>> >> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> >> index e5fb6568..f57a0d94 100644
>> >> --- a/drivers/net/vxlan.c
>> >> +++ b/drivers/net/vxlan.c
>> >> @@ -103,6 +103,7 @@ struct vxlan_rdst {
>> >>       u32                      remote_vni;
>> >>       u32                      remote_ifindex;
>> >>       struct list_head         list;
>> >> +     struct rcu_head          rcu;
>> >>  };
>> >
>> > The use of remotes_cnt here is not SMP safe.
>> > You are using remotes_cnt to size the buffer for dumping, but then the list
>> > of remotes might change during the dump.
>>
>> The remotes_cnt is used only in netlink callbacks with rtnl_lock held
>> and it cannot be modified otherwise, so I don't see why it is not SMP
>> safe.
>>
>> > There a a couple of alternatives here:
>> > 1. Put a hard limit on the number of remotes per MAC.
>> > 2. When there are multiple destnations, just dump multiple entries, like
>> >    multipath routing does.
>> >
>> > I prefer #2 because it also allows for a cleaner API on creation.
>> >
>>
>
> After a few more hours of review, I think the API still needs more work.
> The API uses attributes IFLA_VXLAN_REMOTE_NEW and IFLA_VXLAN_REMOTE_DEL to
> implement adding and deleting entries. This is contrary to other uses of attributes
> in Linux netlink. The convention is that attributes are are descriptors of objects
> not verbs. The attributes are reported and used on creation.
>
> The API needs to use the netlink message flags to indicate create, replace and delete
> instead. It may mean changes to net/core/rtnetlink.c. I would rather see VXLAN follow
> convention as close as possible.

Just to make sure I've got your point here, the API should use
RTM_NEWSOMETHING, RTM_DELSOMETHING and RTM_GETSOMETHING message types
with attribute SOME_PREFIX_VXLAN_REMOTE, and the attribute itself may
contain sub-attributes, such as remote address, port, vni etc...

If this assumption is correct I could think of the following alternatives:

1) Add RTM_NEWVXLANDST, which seems to me somewhat overkill
2) Add RTA_VXLAN_REMOTE to rtattr_type_t. This way that creation API
will be similar to multipath routing, but I'm not sure that adding
VXLAN specific attribute type to rtattr_type_t is appropriate.
3) Allow zero mac address in rtnl_fdb_{add,del} and than make the
default destinations part of the fdb, as David Stevens suggested (1).
In this case fdb deletion should be reworked so that at least one
default destination will be always kept.

I personally favor (2) because it allows semantic distinction between
fdb entries and default destinations.

> Sorry for being so difficult but once an API is done, it has a long lifetime and other
> stuff tends to follow it. I know from experience having made the mistake far
> to often..

I would prefer to receive such feedback earlier, but I definitely
understand your concern :)

--
[1] http://thread.gmane.org/gmane.linux.network/270969/focus=271791

--
Sincerely yours,
Mike.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations
  2013-06-24 19:52         ` Mike Rapoport
@ 2013-06-24 20:24           ` Stephen Hemminger
  0 siblings, 0 replies; 19+ messages in thread
From: Stephen Hemminger @ 2013-06-24 20:24 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: netdev, David Stevens, Thomas Graf

On Mon, 24 Jun 2013 22:52:09 +0300
Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:

> On Mon, Jun 24, 2013 at 6:35 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Mon, 24 Jun 2013 08:57:55 +0300
> > Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
> >
> >> On Mon, Jun 24, 2013 at 3:14 AM, Stephen Hemminger
> >> <stephen@networkplumber.org> wrote:
> >> > On Sun, 23 Jun 2013 19:22:23 +0300
> >> > Mike Rapoport <mike.rapoport@ravellosystems.com> wrote:
> >> >
> >> >> A list of multiple default destinations can be used in environments that
> >> >> disable multicast on the infrastructure level, e.g. public clouds.
> >> >>
> >> >> Signed-off-by: Mike Rapoport <mike.rapoport@ravellosystems.com>
> >> >> ---
> >> >>  drivers/net/vxlan.c          | 268 +++++++++++++++++++++++++++++++++++++++++--
> >> >>  include/uapi/linux/if_link.h |  17 +++
> >> >>  2 files changed, 276 insertions(+), 9 deletions(-)
> >> >>
> >> >> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> >> >> index e5fb6568..f57a0d94 100644
> >> >> --- a/drivers/net/vxlan.c
> >> >> +++ b/drivers/net/vxlan.c
> >> >> @@ -103,6 +103,7 @@ struct vxlan_rdst {
> >> >>       u32                      remote_vni;
> >> >>       u32                      remote_ifindex;
> >> >>       struct list_head         list;
> >> >> +     struct rcu_head          rcu;
> >> >>  };
> >> >
> >> > The use of remotes_cnt here is not SMP safe.
> >> > You are using remotes_cnt to size the buffer for dumping, but then the list
> >> > of remotes might change during the dump.
> >>
> >> The remotes_cnt is used only in netlink callbacks with rtnl_lock held
> >> and it cannot be modified otherwise, so I don't see why it is not SMP
> >> safe.
> >>
> >> > There a a couple of alternatives here:
> >> > 1. Put a hard limit on the number of remotes per MAC.
> >> > 2. When there are multiple destnations, just dump multiple entries, like
> >> >    multipath routing does.
> >> >
> >> > I prefer #2 because it also allows for a cleaner API on creation.
> >> >
> >>
> >
> > After a few more hours of review, I think the API still needs more work.
> > The API uses attributes IFLA_VXLAN_REMOTE_NEW and IFLA_VXLAN_REMOTE_DEL to
> > implement adding and deleting entries. This is contrary to other uses of attributes
> > in Linux netlink. The convention is that attributes are are descriptors of objects
> > not verbs. The attributes are reported and used on creation.
> >
> > The API needs to use the netlink message flags to indicate create, replace and delete
> > instead. It may mean changes to net/core/rtnetlink.c. I would rather see VXLAN follow
> > convention as close as possible.
> 
> Just to make sure I've got your point here, the API should use
> RTM_NEWSOMETHING, RTM_DELSOMETHING and RTM_GETSOMETHING message types
> with attribute SOME_PREFIX_VXLAN_REMOTE, and the attribute itself may
> contain sub-attributes, such as remote address, port, vni etc...
> 
> If this assumption is correct I could think of the following alternatives:
> 
> 1) Add RTM_NEWVXLANDST, which seems to me somewhat overkill
> 2) Add RTA_VXLAN_REMOTE to rtattr_type_t. This way that creation API
> will be similar to multipath routing, but I'm not sure that adding
> VXLAN specific attribute type to rtattr_type_t is appropriate.
> 3) Allow zero mac address in rtnl_fdb_{add,del} and than make the
> default destinations part of the fdb, as David Stevens suggested (1).
> In this case fdb deletion should be reworked so that at least one
> default destination will be always kept.

API should look like adding, deleting, modifying routes.
Ideally, it should all work using existing tools with out lots of special pain.
An example would be:

# bridge fdb add 6a:ee:bc:af:7e:4a dev vxlan0 dst 172.30.42.11
# bridge fdb append 6a:ee:bc:af:7e:4a dev vxlan0 dst 172.30.42.12
# bridge fdb show dev vxlan0
6a:ee:bc:af:7e:4a dst 172.30.42.11 self permanent
6a:ee:bc:af:7e:4a dst 172.30.42.12 self permanent

# bridge fdb delete 6a:ee:bc:af:7e:4a dev vxlan0 dst 172.30.42.11
# bridge fdb show dev vxlan0
6a:ee:bc:af:7e:4a dst 172.30.42.12 self permanent

Right now the netlink flags for NLM_F_EXCL and NLM_F_APPEND have no
meaning so it doesn't work that way.

If you delete all destinations, then just delete the entry.
No point in keeping a default if all remote hops are gone.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-06-24 20:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-23 16:22 [PATCH net-next v4 0/2] vxlan: allow specifying multiple default destinations Mike Rapoport
2013-06-23 16:22 ` [PATCH net-next v4 1/2] vxlan: introduce vxlan_rdst_append Mike Rapoport
2013-06-24  6:02   ` Cong Wang
2013-06-24  6:54     ` Mike Rapoport
2013-06-24  8:28       ` David Miller
2013-06-24  7:21     ` Mike Rapoport
2013-06-23 16:22 ` [PATCH net-next v4 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
2013-06-24  0:14   ` Stephen Hemminger
2013-06-24  5:57     ` Mike Rapoport
2013-06-24 15:35       ` Stephen Hemminger
2013-06-24 19:52         ` Mike Rapoport
2013-06-24 20:24           ` Stephen Hemminger
2013-06-24  6:48   ` Cong Wang
2013-06-24  6:58     ` Mike Rapoport
2013-06-24 15:18     ` Stephen Hemminger
2013-06-23 16:24 ` [PATCH iproute2 1/2] vxlan: introduce vxlan_parse_opt_create Mike Rapoport
2013-06-23 16:24 ` [PATCH iproute2 2/2] vxlan: allow specifying multiple default destinations Mike Rapoport
2013-06-24  0:32   ` Cong Wang
2013-06-24  5:40     ` Mike Rapoport

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.