All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] net: bridge: add support for backup port
@ 2018-07-20 14:48 ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 14:48 UTC (permalink / raw)
  To: netdev
  Cc: roopa, anuradhak, stephen, bridge, wkok, davem, Nikolay Aleksandrov

Hi,
This set introduces a new bridge port option that allows any port to have
any other port (in the same bridge of course) as its backup and traffic
will be forwarded to the backup port when the primary goes down. This is
mainly used in MLAG and EVPN setups where we have peerlink path which is
a backup of many (or even all) ports and is a participating bridge port
itself. There's more detailed information in patch 02. Patch 01 just
prepares the port sysfs code for options that take raw value. The main
issues that this set solves are scalability and fallback latency.

We have used similar code for over 6 months now to bring the fallback
latency of the backup peerlink down and avoid fdb notification storms.
Also due to the nature of master devices such setup is currently not
possible, and last but not least having tens of thousands of fdbs require
thousands of calls to switch.

I've also CCed our MLAG experts that have been using similar option.

Thanks,
 Nik


Nikolay Aleksandrov (2):
  net: bridge: add support for raw sysfs port options
  net: bridge: add support for backup port

 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c      | 16 ++++++++-
 net/bridge/br_if.c           | 53 ++++++++++++++++++++++++++++
 net/bridge/br_netlink.c      | 30 +++++++++++++++-
 net/bridge/br_private.h      |  3 ++
 net/bridge/br_sysfs_if.c     | 82 ++++++++++++++++++++++++++++++++++++--------
 6 files changed, 168 insertions(+), 17 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Bridge] [PATCH net-next 0/2] net: bridge: add support for backup port
@ 2018-07-20 14:48 ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 14:48 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov, roopa, bridge, wkok, anuradhak, davem

Hi,
This set introduces a new bridge port option that allows any port to have
any other port (in the same bridge of course) as its backup and traffic
will be forwarded to the backup port when the primary goes down. This is
mainly used in MLAG and EVPN setups where we have peerlink path which is
a backup of many (or even all) ports and is a participating bridge port
itself. There's more detailed information in patch 02. Patch 01 just
prepares the port sysfs code for options that take raw value. The main
issues that this set solves are scalability and fallback latency.

We have used similar code for over 6 months now to bring the fallback
latency of the backup peerlink down and avoid fdb notification storms.
Also due to the nature of master devices such setup is currently not
possible, and last but not least having tens of thousands of fdbs require
thousands of calls to switch.

I've also CCed our MLAG experts that have been using similar option.

Thanks,
 Nik


Nikolay Aleksandrov (2):
  net: bridge: add support for raw sysfs port options
  net: bridge: add support for backup port

 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c      | 16 ++++++++-
 net/bridge/br_if.c           | 53 ++++++++++++++++++++++++++++
 net/bridge/br_netlink.c      | 30 +++++++++++++++-
 net/bridge/br_private.h      |  3 ++
 net/bridge/br_sysfs_if.c     | 82 ++++++++++++++++++++++++++++++++++++--------
 6 files changed, 168 insertions(+), 17 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-20 14:48 ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-20 14:48   ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 14:48 UTC (permalink / raw)
  To: netdev
  Cc: roopa, anuradhak, stephen, bridge, wkok, davem, Nikolay Aleksandrov

This patch adds a new alternative store callback for port sysfs options
which takes a raw value (buf) and can use it directly. It is needed for the
backup port sysfs support since we have to pass the device by its name.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
There are a few checkpatch warnings here because of the old code, I've
noted them in my todo and will send separate patch for simple_strtoul
and the function prototypes.

 net/bridge/br_sysfs_if.c | 49 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index f99c5bf5c906..38c58879423d 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -25,6 +25,15 @@ struct brport_attribute {
 	struct attribute	attr;
 	ssize_t (*show)(struct net_bridge_port *, char *);
 	int (*store)(struct net_bridge_port *, unsigned long);
+	int (*store_raw)(struct net_bridge_port *, char *);
+};
+
+#define BRPORT_ATTR_RAW(_name, _mode, _show, _store)			\
+const struct brport_attribute brport_attr_##_name = {			\
+	.attr		= {.name = __stringify(_name),			\
+			   .mode = _mode },				\
+	.show		= _show,					\
+	.store_raw	= _store,					\
 };
 
 #define BRPORT_ATTR(_name, _mode, _show, _store)		\
@@ -270,27 +279,37 @@ static ssize_t brport_store(struct kobject *kobj,
 	struct brport_attribute *brport_attr = to_brport_attr(attr);
 	struct net_bridge_port *p = to_brport(kobj);
 	ssize_t ret = -EINVAL;
-	char *endp;
 	unsigned long val;
+	char *endp;
 
 	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
-	val = simple_strtoul(buf, &endp, 0);
-	if (endp != buf) {
-		if (!rtnl_trylock())
-			return restart_syscall();
-		if (p->dev && p->br && brport_attr->store) {
-			spin_lock_bh(&p->br->lock);
-			ret = brport_attr->store(p, val);
-			spin_unlock_bh(&p->br->lock);
-			if (!ret) {
-				br_ifinfo_notify(RTM_NEWLINK, NULL, p);
-				ret = count;
-			}
-		}
-		rtnl_unlock();
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	if (!p->dev || !p->br)
+		goto out_unlock;
+
+	if (brport_attr->store_raw) {
+		spin_lock_bh(&p->br->lock);
+		ret = brport_attr->store_raw(p, (char *)buf);
+		spin_unlock_bh(&p->br->lock);
+	} else if (brport_attr->store) {
+		val = simple_strtoul(buf, &endp, 0);
+		if (endp == buf)
+			goto out_unlock;
+		spin_lock_bh(&p->br->lock);
+		ret = brport_attr->store(p, val);
+		spin_unlock_bh(&p->br->lock);
+	}
+	if (!ret) {
+		br_ifinfo_notify(RTM_NEWLINK, NULL, p);
+		ret = count;
 	}
+out_unlock:
+	rtnl_unlock();
+
 	return ret;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-20 14:48   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 14:48 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov, roopa, bridge, wkok, anuradhak, davem

This patch adds a new alternative store callback for port sysfs options
which takes a raw value (buf) and can use it directly. It is needed for the
backup port sysfs support since we have to pass the device by its name.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
There are a few checkpatch warnings here because of the old code, I've
noted them in my todo and will send separate patch for simple_strtoul
and the function prototypes.

 net/bridge/br_sysfs_if.c | 49 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 15 deletions(-)

diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index f99c5bf5c906..38c58879423d 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -25,6 +25,15 @@ struct brport_attribute {
 	struct attribute	attr;
 	ssize_t (*show)(struct net_bridge_port *, char *);
 	int (*store)(struct net_bridge_port *, unsigned long);
+	int (*store_raw)(struct net_bridge_port *, char *);
+};
+
+#define BRPORT_ATTR_RAW(_name, _mode, _show, _store)			\
+const struct brport_attribute brport_attr_##_name = {			\
+	.attr		= {.name = __stringify(_name),			\
+			   .mode = _mode },				\
+	.show		= _show,					\
+	.store_raw	= _store,					\
 };
 
 #define BRPORT_ATTR(_name, _mode, _show, _store)		\
@@ -270,27 +279,37 @@ static ssize_t brport_store(struct kobject *kobj,
 	struct brport_attribute *brport_attr = to_brport_attr(attr);
 	struct net_bridge_port *p = to_brport(kobj);
 	ssize_t ret = -EINVAL;
-	char *endp;
 	unsigned long val;
+	char *endp;
 
 	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
-	val = simple_strtoul(buf, &endp, 0);
-	if (endp != buf) {
-		if (!rtnl_trylock())
-			return restart_syscall();
-		if (p->dev && p->br && brport_attr->store) {
-			spin_lock_bh(&p->br->lock);
-			ret = brport_attr->store(p, val);
-			spin_unlock_bh(&p->br->lock);
-			if (!ret) {
-				br_ifinfo_notify(RTM_NEWLINK, NULL, p);
-				ret = count;
-			}
-		}
-		rtnl_unlock();
+	if (!rtnl_trylock())
+		return restart_syscall();
+
+	if (!p->dev || !p->br)
+		goto out_unlock;
+
+	if (brport_attr->store_raw) {
+		spin_lock_bh(&p->br->lock);
+		ret = brport_attr->store_raw(p, (char *)buf);
+		spin_unlock_bh(&p->br->lock);
+	} else if (brport_attr->store) {
+		val = simple_strtoul(buf, &endp, 0);
+		if (endp == buf)
+			goto out_unlock;
+		spin_lock_bh(&p->br->lock);
+		ret = brport_attr->store(p, val);
+		spin_unlock_bh(&p->br->lock);
+	}
+	if (!ret) {
+		br_ifinfo_notify(RTM_NEWLINK, NULL, p);
+		ret = count;
 	}
+out_unlock:
+	rtnl_unlock();
+
 	return ret;
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH net-next 2/2] net: bridge: add support for backup port
  2018-07-20 14:48 ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-20 14:48   ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 14:48 UTC (permalink / raw)
  To: netdev
  Cc: roopa, anuradhak, stephen, bridge, wkok, davem, Nikolay Aleksandrov

This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
allows to set a backup port to be used for known unicast traffic if the
port has gone carrier down. The backup pointer is rcu protected and set
only under RTNL, a counter is maintained so when deleting a port we know
how many other ports reference it as a backup and we remove it from all.
Also the pointer is in the first cache line which is hot at the time of
the check and thus in the common case we only add one more test.
The backup port will be used only for the non-flooding case since
it's a part of the bridge and the flooded packets will be forwarded to it
anyway. To remove the forwarding just send a 0/non-existing backup port.
This is used to avoid numerous scalability problems when using MLAG most
notably if we have thousands of fdbs one would need to change all of them
on port carrier going down which takes too long and causes a storm of fdb
notifications (and again when the port comes back up). In a Multi-chassis
Link Aggregation setup usually hosts are connected to two different
switches which act as a single logical switch. Those switches usually have
a control and backup link between them called peerlink which might be used
for communication in case a host loses connectivity to one of them.
We need a fast way to failover in case a host port goes down and currently
none of the solutions (like bond) cannot fulfill the requirements because
the participating ports are actually the "master" devices and must have the
same peerlink as their backup interface and at the same time all of them
must participate in the bridge device. As Roopa noted it's normal practice
in routing called fast re-route where a precalculated backup path is used
when the main one is down.
Another use case of this is with EVPN, having a single vxlan device which
is backup of every port. Due to the nature of master devices it's not
currently possible to use one device as a backup for many and still have
all of them participate in the bridge (which is master itself).
More detailed information about MLAG is available at the link below.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
I've also tried a version where the backup_port is checked via a port flag
but this saves us one internal flag and we use the already cache hot ptr
with a simple null check. I don't mind reverting to the flag behaviour if
anyone has strong preference.
This can be optimized further with a offline backup port switching
mechanism. We can remove the additional fast-path test later. Currently
I'd prefer to leave it simpler for easier review and commenting since
that would be an internal change and maybe isn't even worth the small gain.

 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c      | 16 ++++++++++++-
 net/bridge/br_if.c           | 53 ++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_netlink.c      | 30 ++++++++++++++++++++++++-
 net/bridge/br_private.h      |  3 +++
 net/bridge/br_sysfs_if.c     | 33 +++++++++++++++++++++++++++
 6 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8759cfb8aa2e..01b5069a73a5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -334,6 +334,7 @@ enum {
 	IFLA_BRPORT_GROUP_FWD_MASK,
 	IFLA_BRPORT_NEIGH_SUPPRESS,
 	IFLA_BRPORT_ISOLATED,
+	IFLA_BRPORT_BACKUP_PORT,
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 9019f326fe81..5372e2042adf 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -142,7 +142,20 @@ static int deliver_clone(const struct net_bridge_port *prev,
 void br_forward(const struct net_bridge_port *to,
 		struct sk_buff *skb, bool local_rcv, bool local_orig)
 {
-	if (to && should_deliver(to, skb)) {
+	if (unlikely(!to))
+		goto out;
+
+	/* redirect to backup link if the destination port is down */
+	if (rcu_access_pointer(to->backup_port) && !netif_carrier_ok(to->dev)) {
+		struct net_bridge_port *backup_port;
+
+		backup_port = rcu_dereference(to->backup_port);
+		if (unlikely(!backup_port))
+			goto out;
+		to = backup_port;
+	}
+
+	if (should_deliver(to, skb)) {
 		if (local_rcv)
 			deliver_clone(to, skb, local_orig);
 		else
@@ -150,6 +163,7 @@ void br_forward(const struct net_bridge_port *to,
 		return;
 	}
 
+out:
 	if (!local_rcv)
 		kfree_skb(skb);
 }
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 05e42d86882d..365759ac7c73 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -169,6 +169,58 @@ void br_manage_promisc(struct net_bridge *br)
 	}
 }
 
+int nbp_backup_change(struct net_bridge_port *p,
+		      struct net_device *backup_dev)
+{
+	struct net_bridge_port *old_backup = rtnl_dereference(p->backup_port);
+	struct net_bridge_port *backup_p = NULL;
+
+	ASSERT_RTNL();
+
+	if (backup_dev) {
+		if (!br_port_exists(backup_dev))
+			return -ENOENT;
+
+		backup_p = br_port_get_rtnl(backup_dev);
+		if (backup_p->br != p->br)
+			return -EINVAL;
+	}
+
+	if (p == backup_p)
+		return -EINVAL;
+
+	if (old_backup == backup_p)
+		return 0;
+
+	/* if the backup link is already set, clear it */
+	if (old_backup)
+		old_backup->backup_redirected_cnt--;
+
+	if (backup_p)
+		backup_p->backup_redirected_cnt++;
+	rcu_assign_pointer(p->backup_port, backup_p);
+
+	return 0;
+}
+
+static void nbp_backup_clear(struct net_bridge_port *p)
+{
+	nbp_backup_change(p, NULL);
+	if (p->backup_redirected_cnt) {
+		struct net_bridge_port *cur_p;
+
+		list_for_each_entry(cur_p, &p->br->port_list, list) {
+			struct net_bridge_port *backup_p;
+
+			backup_p = rtnl_dereference(cur_p->backup_port);
+			if (backup_p == p)
+				nbp_backup_change(cur_p, NULL);
+		}
+	}
+
+	WARN_ON(rcu_access_pointer(p->backup_port) || p->backup_redirected_cnt);
+}
+
 static void nbp_update_port_count(struct net_bridge *br)
 {
 	struct net_bridge_port *p;
@@ -286,6 +338,7 @@ static void del_nbp(struct net_bridge_port *p)
 	nbp_vlan_flush(p);
 	br_fdb_delete_by_port(br, p, 0, 1);
 	switchdev_deferred_process();
+	nbp_backup_clear(p);
 
 	nbp_update_port_count(br);
 
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 9f5eb05b0373..ec2b58a09f76 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -169,13 +169,15 @@ static inline size_t br_nlmsg_size(struct net_device *dev, u32 filter_mask)
 		+ nla_total_size(1) /* IFLA_OPERSTATE */
 		+ nla_total_size(br_port_info_size()) /* IFLA_PROTINFO */
 		+ nla_total_size(br_get_link_af_size_filtered(dev,
-				 filter_mask)); /* IFLA_AF_SPEC */
+				 filter_mask)) /* IFLA_AF_SPEC */
+		+ nla_total_size(4); /* IFLA_BRPORT_BACKUP_PORT */
 }
 
 static int br_port_fill_attrs(struct sk_buff *skb,
 			      const struct net_bridge_port *p)
 {
 	u8 mode = !!(p->flags & BR_HAIRPIN_MODE);
+	struct net_bridge_port *backup_p;
 	u64 timerval;
 
 	if (nla_put_u8(skb, IFLA_BRPORT_STATE, p->state) ||
@@ -237,6 +239,14 @@ static int br_port_fill_attrs(struct sk_buff *skb,
 		return -EMSGSIZE;
 #endif
 
+	/* we might be called only with br->lock */
+	rcu_read_lock();
+	backup_p = rcu_dereference(p->backup_port);
+	if (backup_p)
+		nla_put_u32(skb, IFLA_BRPORT_BACKUP_PORT,
+			    backup_p->dev->ifindex);
+	rcu_read_unlock();
+
 	return 0;
 }
 
@@ -663,6 +673,7 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
 	[IFLA_BRPORT_GROUP_FWD_MASK] = { .type = NLA_U16 },
 	[IFLA_BRPORT_NEIGH_SUPPRESS] = { .type = NLA_U8 },
 	[IFLA_BRPORT_ISOLATED]	= { .type = NLA_U8 },
+	[IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 },
 };
 
 /* Change the state of the port and notify spanning tree */
@@ -817,6 +828,23 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
 	if (err)
 		return err;
 
+	if (tb[IFLA_BRPORT_BACKUP_PORT]) {
+		struct net_device *backup_dev = NULL;
+		u32 backup_ifindex;
+
+		backup_ifindex = nla_get_u32(tb[IFLA_BRPORT_BACKUP_PORT]);
+		if (backup_ifindex) {
+			backup_dev = __dev_get_by_index(dev_net(p->dev),
+							backup_ifindex);
+			if (!backup_dev)
+				return -ENOENT;
+		}
+
+		err = nbp_backup_change(p, backup_dev);
+		if (err)
+			return err;
+	}
+
 	br_port_flags_change(p, old_flags ^ p->flags);
 	return 0;
 }
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 5216a524b537..ab7db2466c43 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -237,6 +237,7 @@ struct net_bridge_port {
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
 	struct net_bridge_vlan_group	__rcu *vlgrp;
 #endif
+	struct net_bridge_port		__rcu *backup_port;
 
 	/* STP */
 	u8				priority;
@@ -281,6 +282,7 @@ struct net_bridge_port {
 	int				offload_fwd_mark;
 #endif
 	u16				group_fwd_mask;
+	u16				backup_redirected_cnt;
 };
 
 #define br_auto_port(p) ((p)->flags & BR_AUTO_MASK)
@@ -595,6 +597,7 @@ netdev_features_t br_features_recompute(struct net_bridge *br,
 					netdev_features_t features);
 void br_port_flags_change(struct net_bridge_port *port, unsigned long mask);
 void br_manage_promisc(struct net_bridge *br);
+int nbp_backup_change(struct net_bridge_port *p, struct net_device *backup_dev);
 
 /* br_input.c */
 int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 38c58879423d..79d48e587278 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -191,6 +191,38 @@ static int store_group_fwd_mask(struct net_bridge_port *p,
 static BRPORT_ATTR(group_fwd_mask, 0644, show_group_fwd_mask,
 		   store_group_fwd_mask);
 
+static ssize_t show_backup_port(struct net_bridge_port *p, char *buf)
+{
+	struct net_bridge_port *backup_p;
+	int ret = 0;
+
+	rcu_read_lock();
+	backup_p = rcu_dereference(p->backup_port);
+	if (backup_p)
+		ret = sprintf(buf, "%s\n", backup_p->dev->name);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static int store_backup_port(struct net_bridge_port *p, char *buf)
+{
+	struct net_device *backup_dev = NULL;
+	char *nl = strchr(buf, '\n');
+
+	if (nl)
+		*nl = '\0';
+
+	if (strlen(buf) > 0) {
+		backup_dev = __dev_get_by_name(dev_net(p->dev), buf);
+		if (!backup_dev)
+			return -ENOENT;
+	}
+
+	return nbp_backup_change(p, backup_dev);
+}
+static BRPORT_ATTR_RAW(backup_port, 0644, show_backup_port, store_backup_port);
+
 BRPORT_ATTR_FLAG(hairpin_mode, BR_HAIRPIN_MODE);
 BRPORT_ATTR_FLAG(bpdu_guard, BR_BPDU_GUARD);
 BRPORT_ATTR_FLAG(root_block, BR_ROOT_BLOCK);
@@ -254,6 +286,7 @@ static const struct brport_attribute *brport_attrs[] = {
 	&brport_attr_group_fwd_mask,
 	&brport_attr_neigh_suppress,
 	&brport_attr_isolated,
+	&brport_attr_backup_port,
 	NULL
 };
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Bridge] [PATCH net-next 2/2] net: bridge: add support for backup port
@ 2018-07-20 14:48   ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 14:48 UTC (permalink / raw)
  To: netdev; +Cc: Nikolay Aleksandrov, roopa, bridge, wkok, anuradhak, davem

This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
allows to set a backup port to be used for known unicast traffic if the
port has gone carrier down. The backup pointer is rcu protected and set
only under RTNL, a counter is maintained so when deleting a port we know
how many other ports reference it as a backup and we remove it from all.
Also the pointer is in the first cache line which is hot at the time of
the check and thus in the common case we only add one more test.
The backup port will be used only for the non-flooding case since
it's a part of the bridge and the flooded packets will be forwarded to it
anyway. To remove the forwarding just send a 0/non-existing backup port.
This is used to avoid numerous scalability problems when using MLAG most
notably if we have thousands of fdbs one would need to change all of them
on port carrier going down which takes too long and causes a storm of fdb
notifications (and again when the port comes back up). In a Multi-chassis
Link Aggregation setup usually hosts are connected to two different
switches which act as a single logical switch. Those switches usually have
a control and backup link between them called peerlink which might be used
for communication in case a host loses connectivity to one of them.
We need a fast way to failover in case a host port goes down and currently
none of the solutions (like bond) cannot fulfill the requirements because
the participating ports are actually the "master" devices and must have the
same peerlink as their backup interface and at the same time all of them
must participate in the bridge device. As Roopa noted it's normal practice
in routing called fast re-route where a precalculated backup path is used
when the main one is down.
Another use case of this is with EVPN, having a single vxlan device which
is backup of every port. Due to the nature of master devices it's not
currently possible to use one device as a backup for many and still have
all of them participate in the bridge (which is master itself).
More detailed information about MLAG is available at the link below.
https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
I've also tried a version where the backup_port is checked via a port flag
but this saves us one internal flag and we use the already cache hot ptr
with a simple null check. I don't mind reverting to the flag behaviour if
anyone has strong preference.
This can be optimized further with a offline backup port switching
mechanism. We can remove the additional fast-path test later. Currently
I'd prefer to leave it simpler for easier review and commenting since
that would be an internal change and maybe isn't even worth the small gain.

 include/uapi/linux/if_link.h |  1 +
 net/bridge/br_forward.c      | 16 ++++++++++++-
 net/bridge/br_if.c           | 53 ++++++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_netlink.c      | 30 ++++++++++++++++++++++++-
 net/bridge/br_private.h      |  3 +++
 net/bridge/br_sysfs_if.c     | 33 +++++++++++++++++++++++++++
 6 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8759cfb8aa2e..01b5069a73a5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -334,6 +334,7 @@ enum {
 	IFLA_BRPORT_GROUP_FWD_MASK,
 	IFLA_BRPORT_NEIGH_SUPPRESS,
 	IFLA_BRPORT_ISOLATED,
+	IFLA_BRPORT_BACKUP_PORT,
 	__IFLA_BRPORT_MAX
 };
 #define IFLA_BRPORT_MAX (__IFLA_BRPORT_MAX - 1)
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 9019f326fe81..5372e2042adf 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -142,7 +142,20 @@ static int deliver_clone(const struct net_bridge_port *prev,
 void br_forward(const struct net_bridge_port *to,
 		struct sk_buff *skb, bool local_rcv, bool local_orig)
 {
-	if (to && should_deliver(to, skb)) {
+	if (unlikely(!to))
+		goto out;
+
+	/* redirect to backup link if the destination port is down */
+	if (rcu_access_pointer(to->backup_port) && !netif_carrier_ok(to->dev)) {
+		struct net_bridge_port *backup_port;
+
+		backup_port = rcu_dereference(to->backup_port);
+		if (unlikely(!backup_port))
+			goto out;
+		to = backup_port;
+	}
+
+	if (should_deliver(to, skb)) {
 		if (local_rcv)
 			deliver_clone(to, skb, local_orig);
 		else
@@ -150,6 +163,7 @@ void br_forward(const struct net_bridge_port *to,
 		return;
 	}
 
+out:
 	if (!local_rcv)
 		kfree_skb(skb);
 }
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 05e42d86882d..365759ac7c73 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -169,6 +169,58 @@ void br_manage_promisc(struct net_bridge *br)
 	}
 }
 
+int nbp_backup_change(struct net_bridge_port *p,
+		      struct net_device *backup_dev)
+{
+	struct net_bridge_port *old_backup = rtnl_dereference(p->backup_port);
+	struct net_bridge_port *backup_p = NULL;
+
+	ASSERT_RTNL();
+
+	if (backup_dev) {
+		if (!br_port_exists(backup_dev))
+			return -ENOENT;
+
+		backup_p = br_port_get_rtnl(backup_dev);
+		if (backup_p->br != p->br)
+			return -EINVAL;
+	}
+
+	if (p == backup_p)
+		return -EINVAL;
+
+	if (old_backup == backup_p)
+		return 0;
+
+	/* if the backup link is already set, clear it */
+	if (old_backup)
+		old_backup->backup_redirected_cnt--;
+
+	if (backup_p)
+		backup_p->backup_redirected_cnt++;
+	rcu_assign_pointer(p->backup_port, backup_p);
+
+	return 0;
+}
+
+static void nbp_backup_clear(struct net_bridge_port *p)
+{
+	nbp_backup_change(p, NULL);
+	if (p->backup_redirected_cnt) {
+		struct net_bridge_port *cur_p;
+
+		list_for_each_entry(cur_p, &p->br->port_list, list) {
+			struct net_bridge_port *backup_p;
+
+			backup_p = rtnl_dereference(cur_p->backup_port);
+			if (backup_p == p)
+				nbp_backup_change(cur_p, NULL);
+		}
+	}
+
+	WARN_ON(rcu_access_pointer(p->backup_port) || p->backup_redirected_cnt);
+}
+
 static void nbp_update_port_count(struct net_bridge *br)
 {
 	struct net_bridge_port *p;
@@ -286,6 +338,7 @@ static void del_nbp(struct net_bridge_port *p)
 	nbp_vlan_flush(p);
 	br_fdb_delete_by_port(br, p, 0, 1);
 	switchdev_deferred_process();
+	nbp_backup_clear(p);
 
 	nbp_update_port_count(br);
 
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 9f5eb05b0373..ec2b58a09f76 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -169,13 +169,15 @@ static inline size_t br_nlmsg_size(struct net_device *dev, u32 filter_mask)
 		+ nla_total_size(1) /* IFLA_OPERSTATE */
 		+ nla_total_size(br_port_info_size()) /* IFLA_PROTINFO */
 		+ nla_total_size(br_get_link_af_size_filtered(dev,
-				 filter_mask)); /* IFLA_AF_SPEC */
+				 filter_mask)) /* IFLA_AF_SPEC */
+		+ nla_total_size(4); /* IFLA_BRPORT_BACKUP_PORT */
 }
 
 static int br_port_fill_attrs(struct sk_buff *skb,
 			      const struct net_bridge_port *p)
 {
 	u8 mode = !!(p->flags & BR_HAIRPIN_MODE);
+	struct net_bridge_port *backup_p;
 	u64 timerval;
 
 	if (nla_put_u8(skb, IFLA_BRPORT_STATE, p->state) ||
@@ -237,6 +239,14 @@ static int br_port_fill_attrs(struct sk_buff *skb,
 		return -EMSGSIZE;
 #endif
 
+	/* we might be called only with br->lock */
+	rcu_read_lock();
+	backup_p = rcu_dereference(p->backup_port);
+	if (backup_p)
+		nla_put_u32(skb, IFLA_BRPORT_BACKUP_PORT,
+			    backup_p->dev->ifindex);
+	rcu_read_unlock();
+
 	return 0;
 }
 
@@ -663,6 +673,7 @@ static const struct nla_policy br_port_policy[IFLA_BRPORT_MAX + 1] = {
 	[IFLA_BRPORT_GROUP_FWD_MASK] = { .type = NLA_U16 },
 	[IFLA_BRPORT_NEIGH_SUPPRESS] = { .type = NLA_U8 },
 	[IFLA_BRPORT_ISOLATED]	= { .type = NLA_U8 },
+	[IFLA_BRPORT_BACKUP_PORT] = { .type = NLA_U32 },
 };
 
 /* Change the state of the port and notify spanning tree */
@@ -817,6 +828,23 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
 	if (err)
 		return err;
 
+	if (tb[IFLA_BRPORT_BACKUP_PORT]) {
+		struct net_device *backup_dev = NULL;
+		u32 backup_ifindex;
+
+		backup_ifindex = nla_get_u32(tb[IFLA_BRPORT_BACKUP_PORT]);
+		if (backup_ifindex) {
+			backup_dev = __dev_get_by_index(dev_net(p->dev),
+							backup_ifindex);
+			if (!backup_dev)
+				return -ENOENT;
+		}
+
+		err = nbp_backup_change(p, backup_dev);
+		if (err)
+			return err;
+	}
+
 	br_port_flags_change(p, old_flags ^ p->flags);
 	return 0;
 }
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 5216a524b537..ab7db2466c43 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -237,6 +237,7 @@ struct net_bridge_port {
 #ifdef CONFIG_BRIDGE_VLAN_FILTERING
 	struct net_bridge_vlan_group	__rcu *vlgrp;
 #endif
+	struct net_bridge_port		__rcu *backup_port;
 
 	/* STP */
 	u8				priority;
@@ -281,6 +282,7 @@ struct net_bridge_port {
 	int				offload_fwd_mark;
 #endif
 	u16				group_fwd_mask;
+	u16				backup_redirected_cnt;
 };
 
 #define br_auto_port(p) ((p)->flags & BR_AUTO_MASK)
@@ -595,6 +597,7 @@ netdev_features_t br_features_recompute(struct net_bridge *br,
 					netdev_features_t features);
 void br_port_flags_change(struct net_bridge_port *port, unsigned long mask);
 void br_manage_promisc(struct net_bridge *br);
+int nbp_backup_change(struct net_bridge_port *p, struct net_device *backup_dev);
 
 /* br_input.c */
 int br_handle_frame_finish(struct net *net, struct sock *sk, struct sk_buff *skb);
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 38c58879423d..79d48e587278 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -191,6 +191,38 @@ static int store_group_fwd_mask(struct net_bridge_port *p,
 static BRPORT_ATTR(group_fwd_mask, 0644, show_group_fwd_mask,
 		   store_group_fwd_mask);
 
+static ssize_t show_backup_port(struct net_bridge_port *p, char *buf)
+{
+	struct net_bridge_port *backup_p;
+	int ret = 0;
+
+	rcu_read_lock();
+	backup_p = rcu_dereference(p->backup_port);
+	if (backup_p)
+		ret = sprintf(buf, "%s\n", backup_p->dev->name);
+	rcu_read_unlock();
+
+	return ret;
+}
+
+static int store_backup_port(struct net_bridge_port *p, char *buf)
+{
+	struct net_device *backup_dev = NULL;
+	char *nl = strchr(buf, '\n');
+
+	if (nl)
+		*nl = '\0';
+
+	if (strlen(buf) > 0) {
+		backup_dev = __dev_get_by_name(dev_net(p->dev), buf);
+		if (!backup_dev)
+			return -ENOENT;
+	}
+
+	return nbp_backup_change(p, backup_dev);
+}
+static BRPORT_ATTR_RAW(backup_port, 0644, show_backup_port, store_backup_port);
+
 BRPORT_ATTR_FLAG(hairpin_mode, BR_HAIRPIN_MODE);
 BRPORT_ATTR_FLAG(bpdu_guard, BR_BPDU_GUARD);
 BRPORT_ATTR_FLAG(root_block, BR_ROOT_BLOCK);
@@ -254,6 +286,7 @@ static const struct brport_attribute *brport_attrs[] = {
 	&brport_attr_group_fwd_mask,
 	&brport_attr_neigh_suppress,
 	&brport_attr_isolated,
+	&brport_attr_backup_port,
 	NULL
 };
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 0/2] net: bridge: add support for backup port
  2018-07-20 14:48 ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-20 15:22   ` Stephen Hemminger
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 15:22 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, anuradhak, bridge, wkok, davem

On Fri, 20 Jul 2018 17:48:24 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> Hi,
> This set introduces a new bridge port option that allows any port to have
> any other port (in the same bridge of course) as its backup and traffic
> will be forwarded to the backup port when the primary goes down. This is
> mainly used in MLAG and EVPN setups where we have peerlink path which is
> a backup of many (or even all) ports and is a participating bridge port
> itself. There's more detailed information in patch 02. Patch 01 just
> prepares the port sysfs code for options that take raw value. The main
> issues that this set solves are scalability and fallback latency.
> 
> We have used similar code for over 6 months now to bring the fallback
> latency of the backup peerlink down and avoid fdb notification storms.
> Also due to the nature of master devices such setup is currently not
> possible, and last but not least having tens of thousands of fdbs require
> thousands of calls to switch.
> 
> I've also CCed our MLAG experts that have been using similar option.
> 
> Thanks,
>  Nik
> 
> 
> Nikolay Aleksandrov (2):
>   net: bridge: add support for raw sysfs port options
>   net: bridge: add support for backup port
> 
>  include/uapi/linux/if_link.h |  1 +
>  net/bridge/br_forward.c      | 16 ++++++++-
>  net/bridge/br_if.c           | 53 ++++++++++++++++++++++++++++
>  net/bridge/br_netlink.c      | 30 +++++++++++++++-
>  net/bridge/br_private.h      |  3 ++
>  net/bridge/br_sysfs_if.c     | 82 ++++++++++++++++++++++++++++++++++++--------
>  6 files changed, 168 insertions(+), 17 deletions(-)
> 

Not sure why this has to be built into the bridge.
There already is bonding and teaming, why invent yet another?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 0/2] net: bridge: add support for backup port
@ 2018-07-20 15:22   ` Stephen Hemminger
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 15:22 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On Fri, 20 Jul 2018 17:48:24 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> Hi,
> This set introduces a new bridge port option that allows any port to have
> any other port (in the same bridge of course) as its backup and traffic
> will be forwarded to the backup port when the primary goes down. This is
> mainly used in MLAG and EVPN setups where we have peerlink path which is
> a backup of many (or even all) ports and is a participating bridge port
> itself. There's more detailed information in patch 02. Patch 01 just
> prepares the port sysfs code for options that take raw value. The main
> issues that this set solves are scalability and fallback latency.
> 
> We have used similar code for over 6 months now to bring the fallback
> latency of the backup peerlink down and avoid fdb notification storms.
> Also due to the nature of master devices such setup is currently not
> possible, and last but not least having tens of thousands of fdbs require
> thousands of calls to switch.
> 
> I've also CCed our MLAG experts that have been using similar option.
> 
> Thanks,
>  Nik
> 
> 
> Nikolay Aleksandrov (2):
>   net: bridge: add support for raw sysfs port options
>   net: bridge: add support for backup port
> 
>  include/uapi/linux/if_link.h |  1 +
>  net/bridge/br_forward.c      | 16 ++++++++-
>  net/bridge/br_if.c           | 53 ++++++++++++++++++++++++++++
>  net/bridge/br_netlink.c      | 30 +++++++++++++++-
>  net/bridge/br_private.h      |  3 ++
>  net/bridge/br_sysfs_if.c     | 82 ++++++++++++++++++++++++++++++++++++--------
>  6 files changed, 168 insertions(+), 17 deletions(-)
> 

Not sure why this has to be built into the bridge.
There already is bonding and teaming, why invent yet another?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 0/2] net: bridge: add support for backup port
  2018-07-20 15:22   ` [Bridge] " Stephen Hemminger
@ 2018-07-20 15:26     ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 15:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, anuradhak, bridge, wkok, davem

On 20/07/18 18:22, Stephen Hemminger wrote:
> On Fri, 20 Jul 2018 17:48:24 +0300
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> 
>> Hi,
>> This set introduces a new bridge port option that allows any port to have
>> any other port (in the same bridge of course) as its backup and traffic
>> will be forwarded to the backup port when the primary goes down. This is
>> mainly used in MLAG and EVPN setups where we have peerlink path which is
>> a backup of many (or even all) ports and is a participating bridge port
>> itself. There's more detailed information in patch 02. Patch 01 just
>> prepares the port sysfs code for options that take raw value. The main
>> issues that this set solves are scalability and fallback latency.
>>
>> We have used similar code for over 6 months now to bring the fallback
>> latency of the backup peerlink down and avoid fdb notification storms.
>> Also due to the nature of master devices such setup is currently not
>> possible, and last but not least having tens of thousands of fdbs require
>> thousands of calls to switch.
>>
>> I've also CCed our MLAG experts that have been using similar option.
>>
>> Thanks,
>>  Nik
>>
>>
>> Nikolay Aleksandrov (2):
>>   net: bridge: add support for raw sysfs port options
>>   net: bridge: add support for backup port
>>
>>  include/uapi/linux/if_link.h |  1 +
>>  net/bridge/br_forward.c      | 16 ++++++++-
>>  net/bridge/br_if.c           | 53 ++++++++++++++++++++++++++++
>>  net/bridge/br_netlink.c      | 30 +++++++++++++++-
>>  net/bridge/br_private.h      |  3 ++
>>  net/bridge/br_sysfs_if.c     | 82 ++++++++++++++++++++++++++++++++++++--------
>>  6 files changed, 168 insertions(+), 17 deletions(-)
>>
> 
> Not sure why this has to be built into the bridge.
> There already is bonding and teaming, why invent yet another?
> 

If you read the longer description in patch 02, I've explained why they
cannot solve such problem. The nature of master devices doesn't allow
such setup. We need the ports to be all primary paths with a single
backup path and also to be bridge ports.
I've stressed this point here in the short and also in the longer
description in patch 02.

We have discussed various possible alternatives but they all fall short
and can't deliver neither the switch to backup latency nor the
scalability, and their complexity is huge (multiple virtual devices with
complex forwarding rules).

We haven't taken this decision lightly.

Thanks,
 Nik

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 0/2] net: bridge: add support for backup port
@ 2018-07-20 15:26     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 15:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On 20/07/18 18:22, Stephen Hemminger wrote:
> On Fri, 20 Jul 2018 17:48:24 +0300
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> 
>> Hi,
>> This set introduces a new bridge port option that allows any port to have
>> any other port (in the same bridge of course) as its backup and traffic
>> will be forwarded to the backup port when the primary goes down. This is
>> mainly used in MLAG and EVPN setups where we have peerlink path which is
>> a backup of many (or even all) ports and is a participating bridge port
>> itself. There's more detailed information in patch 02. Patch 01 just
>> prepares the port sysfs code for options that take raw value. The main
>> issues that this set solves are scalability and fallback latency.
>>
>> We have used similar code for over 6 months now to bring the fallback
>> latency of the backup peerlink down and avoid fdb notification storms.
>> Also due to the nature of master devices such setup is currently not
>> possible, and last but not least having tens of thousands of fdbs require
>> thousands of calls to switch.
>>
>> I've also CCed our MLAG experts that have been using similar option.
>>
>> Thanks,
>>  Nik
>>
>>
>> Nikolay Aleksandrov (2):
>>   net: bridge: add support for raw sysfs port options
>>   net: bridge: add support for backup port
>>
>>  include/uapi/linux/if_link.h |  1 +
>>  net/bridge/br_forward.c      | 16 ++++++++-
>>  net/bridge/br_if.c           | 53 ++++++++++++++++++++++++++++
>>  net/bridge/br_netlink.c      | 30 +++++++++++++++-
>>  net/bridge/br_private.h      |  3 ++
>>  net/bridge/br_sysfs_if.c     | 82 ++++++++++++++++++++++++++++++++++++--------
>>  6 files changed, 168 insertions(+), 17 deletions(-)
>>
> 
> Not sure why this has to be built into the bridge.
> There already is bonding and teaming, why invent yet another?
> 

If you read the longer description in patch 02, I've explained why they
cannot solve such problem. The nature of master devices doesn't allow
such setup. We need the ports to be all primary paths with a single
backup path and also to be bridge ports.
I've stressed this point here in the short and also in the longer
description in patch 02.

We have discussed various possible alternatives but they all fall short
and can't deliver neither the switch to backup latency nor the
scalability, and their complexity is huge (multiple virtual devices with
complex forwarding rules).

We haven't taken this decision lightly.

Thanks,
 Nik

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-20 14:48   ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-20 15:57     ` Stephen Hemminger
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 15:57 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, anuradhak, bridge, wkok, davem

On Fri, 20 Jul 2018 17:48:25 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> This patch adds a new alternative store callback for port sysfs options
> which takes a raw value (buf) and can use it directly. It is needed for the
> backup port sysfs support since we have to pass the device by its name.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> There are a few checkpatch warnings here because of the old code, I've
> noted them in my todo and will send separate patch for simple_strtoul
> and the function prototypes.
> 
>  net/bridge/br_sysfs_if.c | 49 +++++++++++++++++++++++++++++++++---------------
>  1 file changed, 34 insertions(+), 15 deletions(-)
> 
> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> index f99c5bf5c906..38c58879423d 100644
> --- a/net/bridge/br_sysfs_if.c
> +++ b/net/bridge/br_sysfs_if.c
> @@ -25,6 +25,15 @@ struct brport_attribute {
>  	struct attribute	attr;
>  	ssize_t (*show)(struct net_bridge_port *, char *);
>  	int (*store)(struct net_bridge_port *, unsigned long);
> +	int (*store_raw)(struct net_bridge_port *, char *);
> +};
> +
> +#define BRPORT_ATTR_RAW(_name, _mode, _show, _store)			\
> +const struct brport_attribute brport_attr_##_name = {			\
> +	.attr		= {.name = __stringify(_name),			\
> +			   .mode = _mode },				\
> +	.show		= _show,					\
> +	.store_raw	= _store,					\
>  };
>  
>  #define BRPORT_ATTR(_name, _mode, _show, _store)		\
> @@ -270,27 +279,37 @@ static ssize_t brport_store(struct kobject *kobj,
>  	struct brport_attribute *brport_attr = to_brport_attr(attr);
>  	struct net_bridge_port *p = to_brport(kobj);
>  	ssize_t ret = -EINVAL;
> -	char *endp;
>  	unsigned long val;
> +	char *endp;
>  
>  	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
>  		return -EPERM;
>  
> -	val = simple_strtoul(buf, &endp, 0);
> -	if (endp != buf) {
> -		if (!rtnl_trylock())
> -			return restart_syscall();
> -		if (p->dev && p->br && brport_attr->store) {
> -			spin_lock_bh(&p->br->lock);
> -			ret = brport_attr->store(p, val);
> -			spin_unlock_bh(&p->br->lock);
> -			if (!ret) {
> -				br_ifinfo_notify(RTM_NEWLINK, NULL, p);
> -				ret = count;
> -			}
> -		}
> -		rtnl_unlock();
> +	if (!rtnl_trylock())
> +		return restart_syscall();
> +
> +	if (!p->dev || !p->br)
> +		goto out_unlock;
> +
> +	if (brport_attr->store_raw) {
> +		spin_lock_bh(&p->br->lock);
> +		ret = brport_attr->store_raw(p, (char *)buf);
> +		spin_unlock_bh(&p->br->lock);
> +	} else if (brport_attr->store) {
> +		val = simple_strtoul(buf, &endp, 0);
> +		if (endp == buf)
> +			goto out_unlock;
> +		spin_lock_bh(&p->br->lock);
> +		ret = brport_attr->store(p, val);
> +		spin_unlock_bh(&p->br->lock);
> +	}
> +	if (!ret) {
> +		br_ifinfo_notify(RTM_NEWLINK, NULL, p);
> +		ret = count;
>  	}
> +out_unlock:
> +	rtnl_unlock();
> +
>  	return ret;
>  }
>  

I see where you are going with this. You want a sysfs attribute where is a string
not a number. This makes sense.

The code might be simpler if you always acquired  the lock. Or maybe
move the locking into the store functions.

Casting away the const on the buf variable is going to cause warnings
and should not be necessary.

You might also want an error case if neither store or store_raw are defined.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-20 15:57     ` Stephen Hemminger
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 15:57 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On Fri, 20 Jul 2018 17:48:25 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> This patch adds a new alternative store callback for port sysfs options
> which takes a raw value (buf) and can use it directly. It is needed for the
> backup port sysfs support since we have to pass the device by its name.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> There are a few checkpatch warnings here because of the old code, I've
> noted them in my todo and will send separate patch for simple_strtoul
> and the function prototypes.
> 
>  net/bridge/br_sysfs_if.c | 49 +++++++++++++++++++++++++++++++++---------------
>  1 file changed, 34 insertions(+), 15 deletions(-)
> 
> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
> index f99c5bf5c906..38c58879423d 100644
> --- a/net/bridge/br_sysfs_if.c
> +++ b/net/bridge/br_sysfs_if.c
> @@ -25,6 +25,15 @@ struct brport_attribute {
>  	struct attribute	attr;
>  	ssize_t (*show)(struct net_bridge_port *, char *);
>  	int (*store)(struct net_bridge_port *, unsigned long);
> +	int (*store_raw)(struct net_bridge_port *, char *);
> +};
> +
> +#define BRPORT_ATTR_RAW(_name, _mode, _show, _store)			\
> +const struct brport_attribute brport_attr_##_name = {			\
> +	.attr		= {.name = __stringify(_name),			\
> +			   .mode = _mode },				\
> +	.show		= _show,					\
> +	.store_raw	= _store,					\
>  };
>  
>  #define BRPORT_ATTR(_name, _mode, _show, _store)		\
> @@ -270,27 +279,37 @@ static ssize_t brport_store(struct kobject *kobj,
>  	struct brport_attribute *brport_attr = to_brport_attr(attr);
>  	struct net_bridge_port *p = to_brport(kobj);
>  	ssize_t ret = -EINVAL;
> -	char *endp;
>  	unsigned long val;
> +	char *endp;
>  
>  	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
>  		return -EPERM;
>  
> -	val = simple_strtoul(buf, &endp, 0);
> -	if (endp != buf) {
> -		if (!rtnl_trylock())
> -			return restart_syscall();
> -		if (p->dev && p->br && brport_attr->store) {
> -			spin_lock_bh(&p->br->lock);
> -			ret = brport_attr->store(p, val);
> -			spin_unlock_bh(&p->br->lock);
> -			if (!ret) {
> -				br_ifinfo_notify(RTM_NEWLINK, NULL, p);
> -				ret = count;
> -			}
> -		}
> -		rtnl_unlock();
> +	if (!rtnl_trylock())
> +		return restart_syscall();
> +
> +	if (!p->dev || !p->br)
> +		goto out_unlock;
> +
> +	if (brport_attr->store_raw) {
> +		spin_lock_bh(&p->br->lock);
> +		ret = brport_attr->store_raw(p, (char *)buf);
> +		spin_unlock_bh(&p->br->lock);
> +	} else if (brport_attr->store) {
> +		val = simple_strtoul(buf, &endp, 0);
> +		if (endp == buf)
> +			goto out_unlock;
> +		spin_lock_bh(&p->br->lock);
> +		ret = brport_attr->store(p, val);
> +		spin_unlock_bh(&p->br->lock);
> +	}
> +	if (!ret) {
> +		br_ifinfo_notify(RTM_NEWLINK, NULL, p);
> +		ret = count;
>  	}
> +out_unlock:
> +	rtnl_unlock();
> +
>  	return ret;
>  }
>  

I see where you are going with this. You want a sysfs attribute where is a string
not a number. This makes sense.

The code might be simpler if you always acquired  the lock. Or maybe
move the locking into the store functions.

Casting away the const on the buf variable is going to cause warnings
and should not be necessary.

You might also want an error case if neither store or store_raw are defined.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 2/2] net: bridge: add support for backup port
  2018-07-20 14:48   ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-20 16:02     ` Stephen Hemminger
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 16:02 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On Fri, 20 Jul 2018 17:48:26 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
> allows to set a backup port to be used for known unicast traffic if the
> port has gone carrier down. The backup pointer is rcu protected and set
> only under RTNL, a counter is maintained so when deleting a port we know
> how many other ports reference it as a backup and we remove it from all.
> Also the pointer is in the first cache line which is hot at the time of
> the check and thus in the common case we only add one more test.
> The backup port will be used only for the non-flooding case since
> it's a part of the bridge and the flooded packets will be forwarded to it
> anyway. To remove the forwarding just send a 0/non-existing backup port.
> This is used to avoid numerous scalability problems when using MLAG most
> notably if we have thousands of fdbs one would need to change all of them
> on port carrier going down which takes too long and causes a storm of fdb
> notifications (and again when the port comes back up). In a Multi-chassis
> Link Aggregation setup usually hosts are connected to two different
> switches which act as a single logical switch. Those switches usually have
> a control and backup link between them called peerlink which might be used
> for communication in case a host loses connectivity to one of them.
> We need a fast way to failover in case a host port goes down and currently
> none of the solutions (like bond) cannot fulfill the requirements because
> the participating ports are actually the "master" devices and must have the
> same peerlink as their backup interface and at the same time all of them
> must participate in the bridge device. As Roopa noted it's normal practice
> in routing called fast re-route where a precalculated backup path is used
> when the main one is down.
> Another use case of this is with EVPN, having a single vxlan device which
> is backup of every port. Due to the nature of master devices it's not
> currently possible to use one device as a backup for many and still have
> all of them participate in the bridge (which is master itself).
> More detailed information about MLAG is available at the link below.
> https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Trying to understand this.

Is it the case that what you are trying to solve is the way MLAG
and bridging interact on the Linux side or more a limitation of how
switches operate?  Wouldn't this work?
                             
		br0 -- team0 -- eth1
                             +- eth2

The bridge would only have fdb entries for the team device.
Why do eth1 and eth2 have to be master devices?  Why would eth1
and eth2 need to be bridge ports.

This kind of thing in the bridge is most likely inevitable, and
I am guilty of introducing same logic into Hyper-V driver.
But still getting pushback that the multi device model is better.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 2/2] net: bridge: add support for backup port
@ 2018-07-20 16:02     ` Stephen Hemminger
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 16:02 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On Fri, 20 Jul 2018 17:48:26 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
> allows to set a backup port to be used for known unicast traffic if the
> port has gone carrier down. The backup pointer is rcu protected and set
> only under RTNL, a counter is maintained so when deleting a port we know
> how many other ports reference it as a backup and we remove it from all.
> Also the pointer is in the first cache line which is hot at the time of
> the check and thus in the common case we only add one more test.
> The backup port will be used only for the non-flooding case since
> it's a part of the bridge and the flooded packets will be forwarded to it
> anyway. To remove the forwarding just send a 0/non-existing backup port.
> This is used to avoid numerous scalability problems when using MLAG most
> notably if we have thousands of fdbs one would need to change all of them
> on port carrier going down which takes too long and causes a storm of fdb
> notifications (and again when the port comes back up). In a Multi-chassis
> Link Aggregation setup usually hosts are connected to two different
> switches which act as a single logical switch. Those switches usually have
> a control and backup link between them called peerlink which might be used
> for communication in case a host loses connectivity to one of them.
> We need a fast way to failover in case a host port goes down and currently
> none of the solutions (like bond) cannot fulfill the requirements because
> the participating ports are actually the "master" devices and must have the
> same peerlink as their backup interface and at the same time all of them
> must participate in the bridge device. As Roopa noted it's normal practice
> in routing called fast re-route where a precalculated backup path is used
> when the main one is down.
> Another use case of this is with EVPN, having a single vxlan device which
> is backup of every port. Due to the nature of master devices it's not
> currently possible to use one device as a backup for many and still have
> all of them participate in the bridge (which is master itself).
> More detailed information about MLAG is available at the link below.
> https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Trying to understand this.

Is it the case that what you are trying to solve is the way MLAG
and bridging interact on the Linux side or more a limitation of how
switches operate?  Wouldn't this work?
                             
		br0 -- team0 -- eth1
                             +- eth2

The bridge would only have fdb entries for the team device.
Why do eth1 and eth2 have to be master devices?  Why would eth1
and eth2 need to be bridge ports.

This kind of thing in the bridge is most likely inevitable, and
I am guilty of introducing same logic into Hyper-V driver.
But still getting pushback that the multi device model is better.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 2/2] net: bridge: add support for backup port
  2018-07-20 16:02     ` [Bridge] " Stephen Hemminger
@ 2018-07-20 16:41       ` Roopa Prabhu
  -1 siblings, 0 replies; 34+ messages in thread
From: Roopa Prabhu @ 2018-07-20 16:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Nikolay Aleksandrov, netdev, Anuradha Karuppiah, bridge,
	Wilson Kok, David Miller

On Fri, Jul 20, 2018 at 9:02 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri, 20 Jul 2018 17:48:26 +0300
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>
>> This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
>> allows to set a backup port to be used for known unicast traffic if the
>> port has gone carrier down. The backup pointer is rcu protected and set
>> only under RTNL, a counter is maintained so when deleting a port we know
>> how many other ports reference it as a backup and we remove it from all.
>> Also the pointer is in the first cache line which is hot at the time of
>> the check and thus in the common case we only add one more test.
>> The backup port will be used only for the non-flooding case since
>> it's a part of the bridge and the flooded packets will be forwarded to it
>> anyway. To remove the forwarding just send a 0/non-existing backup port.
>> This is used to avoid numerous scalability problems when using MLAG most
>> notably if we have thousands of fdbs one would need to change all of them
>> on port carrier going down which takes too long and causes a storm of fdb
>> notifications (and again when the port comes back up). In a Multi-chassis
>> Link Aggregation setup usually hosts are connected to two different
>> switches which act as a single logical switch. Those switches usually have
>> a control and backup link between them called peerlink which might be used
>> for communication in case a host loses connectivity to one of them.
>> We need a fast way to failover in case a host port goes down and currently
>> none of the solutions (like bond) cannot fulfill the requirements because
>> the participating ports are actually the "master" devices and must have the
>> same peerlink as their backup interface and at the same time all of them
>> must participate in the bridge device. As Roopa noted it's normal practice
>> in routing called fast re-route where a precalculated backup path is used
>> when the main one is down.
>> Another use case of this is with EVPN, having a single vxlan device which
>> is backup of every port. Due to the nature of master devices it's not
>> currently possible to use one device as a backup for many and still have
>> all of them participate in the bridge (which is master itself).
>> More detailed information about MLAG is available at the link below.
>> https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
>>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
> Trying to understand this.
>
> Is it the case that what you are trying to solve is the way MLAG
> and bridging interact on the Linux side or more a limitation of how
> switches operate?  Wouldn't this work?

not a limitation. Its the way MLAG works on the switch side

>
>                 br0 -- team0 -- eth1
>                              +- eth2
>
> The bridge would only have fdb entries for the team device.
> Why do eth1 and eth2 have to be master devices?  Why would eth1
> and eth2 need to be bridge ports.


Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.


br0 -- team0---eth0
      |
      -- switch-peerlink

switch-peerlink becomes the failover/backport port when say team0 to
the server goes down.
Today, when team0 goes down, control plane has to withdraw all the fdb
entries pointing to team0
and re-install the fdb entries pointing to switch-peerlink...and
restore the fdb entries when team0 comes back up again.
and  this is the problem we are trying to solve.

This also becomes necessary when multihoming is implemented by a
standard like E-VPN https://tools.ietf.org/html/rfc8365#section-8
where the 'switch-peerlink' is an overlay vxlan port (like nikolay
mentions in his patch commit). In these implementations, the fdb scale
can be much larger.

On why bond failover cannot be used here ?: the point that nikolay was
alluding to is, switch-peerlink in the above example is a bridge port
and is a failover/backport port for more than one or all ports in the
bridge br0. And you cannot enslave switch-peerlink into a second level
team
with other bridge ports. Hence a multi layered team device is not an
option (FWIW, switch-peerlink is also a teamed interface to the peer
switch).

We have also discussed trying to achieve this by creating a fdb dst
failover group at the fdb layer instead of the port layer...for faster
fwding failover.
But nikolay soon recognized that this gets more complicated with
learning etc. Hence this patch keeps it simple and adds it at the port
layer.


>
> This kind of thing in the bridge is most likely inevitable, and
> I am guilty of introducing same logic into Hyper-V driver.
> But still getting pushback that the multi device model is better.
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 2/2] net: bridge: add support for backup port
@ 2018-07-20 16:41       ` Roopa Prabhu
  0 siblings, 0 replies; 34+ messages in thread
From: Roopa Prabhu @ 2018-07-20 16:41 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Nikolay Aleksandrov, netdev, bridge, Wilson Kok,
	Anuradha Karuppiah, David Miller

On Fri, Jul 20, 2018 at 9:02 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri, 20 Jul 2018 17:48:26 +0300
> Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>
>> This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
>> allows to set a backup port to be used for known unicast traffic if the
>> port has gone carrier down. The backup pointer is rcu protected and set
>> only under RTNL, a counter is maintained so when deleting a port we know
>> how many other ports reference it as a backup and we remove it from all.
>> Also the pointer is in the first cache line which is hot at the time of
>> the check and thus in the common case we only add one more test.
>> The backup port will be used only for the non-flooding case since
>> it's a part of the bridge and the flooded packets will be forwarded to it
>> anyway. To remove the forwarding just send a 0/non-existing backup port.
>> This is used to avoid numerous scalability problems when using MLAG most
>> notably if we have thousands of fdbs one would need to change all of them
>> on port carrier going down which takes too long and causes a storm of fdb
>> notifications (and again when the port comes back up). In a Multi-chassis
>> Link Aggregation setup usually hosts are connected to two different
>> switches which act as a single logical switch. Those switches usually have
>> a control and backup link between them called peerlink which might be used
>> for communication in case a host loses connectivity to one of them.
>> We need a fast way to failover in case a host port goes down and currently
>> none of the solutions (like bond) cannot fulfill the requirements because
>> the participating ports are actually the "master" devices and must have the
>> same peerlink as their backup interface and at the same time all of them
>> must participate in the bridge device. As Roopa noted it's normal practice
>> in routing called fast re-route where a precalculated backup path is used
>> when the main one is down.
>> Another use case of this is with EVPN, having a single vxlan device which
>> is backup of every port. Due to the nature of master devices it's not
>> currently possible to use one device as a backup for many and still have
>> all of them participate in the bridge (which is master itself).
>> More detailed information about MLAG is available at the link below.
>> https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
>>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
> Trying to understand this.
>
> Is it the case that what you are trying to solve is the way MLAG
> and bridging interact on the Linux side or more a limitation of how
> switches operate?  Wouldn't this work?

not a limitation. Its the way MLAG works on the switch side

>
>                 br0 -- team0 -- eth1
>                              +- eth2
>
> The bridge would only have fdb entries for the team device.
> Why do eth1 and eth2 have to be master devices?  Why would eth1
> and eth2 need to be bridge ports.


Two switches acting in a MLAG pair are connected by the peerlink
interface which is a bridge port.

the config on one of the switches looks like the below. The other
switch also has a similar config.
eth0 is connected to one port on the server. And the server is
connected to both switches.


br0 -- team0---eth0
      |
      -- switch-peerlink

switch-peerlink becomes the failover/backport port when say team0 to
the server goes down.
Today, when team0 goes down, control plane has to withdraw all the fdb
entries pointing to team0
and re-install the fdb entries pointing to switch-peerlink...and
restore the fdb entries when team0 comes back up again.
and  this is the problem we are trying to solve.

This also becomes necessary when multihoming is implemented by a
standard like E-VPN https://tools.ietf.org/html/rfc8365#section-8
where the 'switch-peerlink' is an overlay vxlan port (like nikolay
mentions in his patch commit). In these implementations, the fdb scale
can be much larger.

On why bond failover cannot be used here ?: the point that nikolay was
alluding to is, switch-peerlink in the above example is a bridge port
and is a failover/backport port for more than one or all ports in the
bridge br0. And you cannot enslave switch-peerlink into a second level
team
with other bridge ports. Hence a multi layered team device is not an
option (FWIW, switch-peerlink is also a teamed interface to the peer
switch).

We have also discussed trying to achieve this by creating a fdb dst
failover group at the fdb layer instead of the port layer...for faster
fwding failover.
But nikolay soon recognized that this gets more complicated with
learning etc. Hence this patch keeps it simple and adds it at the port
layer.


>
> This kind of thing in the bridge is most likely inevitable, and
> I am guilty of introducing same logic into Hyper-V driver.
> But still getting pushback that the multi device model is better.
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-20 15:57     ` [Bridge] " Stephen Hemminger
@ 2018-07-20 17:14       ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 17:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, anuradhak, bridge, wkok, davem

On July 20, 2018 6:57:25 PM GMT+03:00, Stephen Hemminger <stephen@networkplumber.org> wrote:
>On Fri, 20 Jul 2018 17:48:25 +0300
>Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>
>> This patch adds a new alternative store callback for port sysfs
>options
>> which takes a raw value (buf) and can use it directly. It is needed
>for the
>> backup port sysfs support since we have to pass the device by its
>name.
>> 
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>> There are a few checkpatch warnings here because of the old code,
>I've
>> noted them in my todo and will send separate patch for simple_strtoul
>> and the function prototypes.
>> 
>>  net/bridge/br_sysfs_if.c | 49
>+++++++++++++++++++++++++++++++++---------------
>>  1 file changed, 34 insertions(+), 15 deletions(-)
>> 
>> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
>> index f99c5bf5c906..38c58879423d 100644
>> --- a/net/bridge/br_sysfs_if.c
>> +++ b/net/bridge/br_sysfs_if.c
>> @@ -25,6 +25,15 @@ struct brport_attribute {
>>  	struct attribute	attr;
>>  	ssize_t (*show)(struct net_bridge_port *, char *);
>>  	int (*store)(struct net_bridge_port *, unsigned long);
>> +	int (*store_raw)(struct net_bridge_port *, char *);
>> +};
>> +
>> +#define BRPORT_ATTR_RAW(_name, _mode, _show, _store)			\
>> +const struct brport_attribute brport_attr_##_name = {			\
>> +	.attr		= {.name = __stringify(_name),			\
>> +			   .mode = _mode },				\
>> +	.show		= _show,					\
>> +	.store_raw	= _store,					\
>>  };
>>  
>>  #define BRPORT_ATTR(_name, _mode, _show, _store)		\
>> @@ -270,27 +279,37 @@ static ssize_t brport_store(struct kobject
>*kobj,
>>  	struct brport_attribute *brport_attr = to_brport_attr(attr);
>>  	struct net_bridge_port *p = to_brport(kobj);
>>  	ssize_t ret = -EINVAL;
>> -	char *endp;
>>  	unsigned long val;
>> +	char *endp;
>>  
>>  	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
>>  		return -EPERM;
>>  
>> -	val = simple_strtoul(buf, &endp, 0);
>> -	if (endp != buf) {
>> -		if (!rtnl_trylock())
>> -			return restart_syscall();
>> -		if (p->dev && p->br && brport_attr->store) {
>> -			spin_lock_bh(&p->br->lock);
>> -			ret = brport_attr->store(p, val);
>> -			spin_unlock_bh(&p->br->lock);
>> -			if (!ret) {
>> -				br_ifinfo_notify(RTM_NEWLINK, NULL, p);
>> -				ret = count;
>> -			}
>> -		}
>> -		rtnl_unlock();
>> +	if (!rtnl_trylock())
>> +		return restart_syscall();
>> +
>> +	if (!p->dev || !p->br)
>> +		goto out_unlock;
>> +
>> +	if (brport_attr->store_raw) {
>> +		spin_lock_bh(&p->br->lock);
>> +		ret = brport_attr->store_raw(p, (char *)buf);
>> +		spin_unlock_bh(&p->br->lock);
>> +	} else if (brport_attr->store) {
>> +		val = simple_strtoul(buf, &endp, 0);
>> +		if (endp == buf)
>> +			goto out_unlock;
>> +		spin_lock_bh(&p->br->lock);
>> +		ret = brport_attr->store(p, val);
>> +		spin_unlock_bh(&p->br->lock);
>> +	}
>> +	if (!ret) {
>> +		br_ifinfo_notify(RTM_NEWLINK, NULL, p);
>> +		ret = count;
>>  	}
>> +out_unlock:
>> +	rtnl_unlock();
>> +
>>  	return ret;
>>  }
>>  
>
>I see where you are going with this. You want a sysfs attribute where
>is a string
>not a number. This makes sense.
>
>The code might be simpler if you always acquired  the lock. Or maybe
>move the locking into the store functions.
>

I considered that but wanted to limit the lock because it blocks learning. 
Since the operations are relatively simple I'll move it earlier.


>Casting away the const on the buf variable is going to cause warnings
>and should not be necessary.
>

It doesn't when it's casted like that, the new line is changed to null byte so we need to drop 
the const.

>You might also want an error case if neither store or store_raw are
>defined.

It is already there, just try writing to any other attribute that doesn't have store.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-20 17:14       ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 17:14 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On July 20, 2018 6:57:25 PM GMT+03:00, Stephen Hemminger <stephen@networkplumber.org> wrote:
>On Fri, 20 Jul 2018 17:48:25 +0300
>Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>
>> This patch adds a new alternative store callback for port sysfs
>options
>> which takes a raw value (buf) and can use it directly. It is needed
>for the
>> backup port sysfs support since we have to pass the device by its
>name.
>> 
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>> There are a few checkpatch warnings here because of the old code,
>I've
>> noted them in my todo and will send separate patch for simple_strtoul
>> and the function prototypes.
>> 
>>  net/bridge/br_sysfs_if.c | 49
>+++++++++++++++++++++++++++++++++---------------
>>  1 file changed, 34 insertions(+), 15 deletions(-)
>> 
>> diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
>> index f99c5bf5c906..38c58879423d 100644
>> --- a/net/bridge/br_sysfs_if.c
>> +++ b/net/bridge/br_sysfs_if.c
>> @@ -25,6 +25,15 @@ struct brport_attribute {
>>  	struct attribute	attr;
>>  	ssize_t (*show)(struct net_bridge_port *, char *);
>>  	int (*store)(struct net_bridge_port *, unsigned long);
>> +	int (*store_raw)(struct net_bridge_port *, char *);
>> +};
>> +
>> +#define BRPORT_ATTR_RAW(_name, _mode, _show, _store)			\
>> +const struct brport_attribute brport_attr_##_name = {			\
>> +	.attr		= {.name = __stringify(_name),			\
>> +			   .mode = _mode },				\
>> +	.show		= _show,					\
>> +	.store_raw	= _store,					\
>>  };
>>  
>>  #define BRPORT_ATTR(_name, _mode, _show, _store)		\
>> @@ -270,27 +279,37 @@ static ssize_t brport_store(struct kobject
>*kobj,
>>  	struct brport_attribute *brport_attr = to_brport_attr(attr);
>>  	struct net_bridge_port *p = to_brport(kobj);
>>  	ssize_t ret = -EINVAL;
>> -	char *endp;
>>  	unsigned long val;
>> +	char *endp;
>>  
>>  	if (!ns_capable(dev_net(p->dev)->user_ns, CAP_NET_ADMIN))
>>  		return -EPERM;
>>  
>> -	val = simple_strtoul(buf, &endp, 0);
>> -	if (endp != buf) {
>> -		if (!rtnl_trylock())
>> -			return restart_syscall();
>> -		if (p->dev && p->br && brport_attr->store) {
>> -			spin_lock_bh(&p->br->lock);
>> -			ret = brport_attr->store(p, val);
>> -			spin_unlock_bh(&p->br->lock);
>> -			if (!ret) {
>> -				br_ifinfo_notify(RTM_NEWLINK, NULL, p);
>> -				ret = count;
>> -			}
>> -		}
>> -		rtnl_unlock();
>> +	if (!rtnl_trylock())
>> +		return restart_syscall();
>> +
>> +	if (!p->dev || !p->br)
>> +		goto out_unlock;
>> +
>> +	if (brport_attr->store_raw) {
>> +		spin_lock_bh(&p->br->lock);
>> +		ret = brport_attr->store_raw(p, (char *)buf);
>> +		spin_unlock_bh(&p->br->lock);
>> +	} else if (brport_attr->store) {
>> +		val = simple_strtoul(buf, &endp, 0);
>> +		if (endp == buf)
>> +			goto out_unlock;
>> +		spin_lock_bh(&p->br->lock);
>> +		ret = brport_attr->store(p, val);
>> +		spin_unlock_bh(&p->br->lock);
>> +	}
>> +	if (!ret) {
>> +		br_ifinfo_notify(RTM_NEWLINK, NULL, p);
>> +		ret = count;
>>  	}
>> +out_unlock:
>> +	rtnl_unlock();
>> +
>>  	return ret;
>>  }
>>  
>
>I see where you are going with this. You want a sysfs attribute where
>is a string
>not a number. This makes sense.
>
>The code might be simpler if you always acquired  the lock. Or maybe
>move the locking into the store functions.
>

I considered that but wanted to limit the lock because it blocks learning. 
Since the operations are relatively simple I'll move it earlier.


>Casting away the const on the buf variable is going to cause warnings
>and should not be necessary.
>

It doesn't when it's casted like that, the new line is changed to null byte so we need to drop 
the const.

>You might also want an error case if neither store or store_raw are
>defined.

It is already there, just try writing to any other attribute that doesn't have store.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-20 17:14       ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-20 17:20         ` Stephen Hemminger
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 17:20 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On Fri, 20 Jul 2018 20:14:43 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> >Casting away the const on the buf variable is going to cause warnings
> >and should not be necessary.
> >  
> 
> It doesn't when it's casted like that, the new line is changed to null byte so we need to drop 
> the const.

Then change store function to take a char *?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-20 17:20         ` Stephen Hemminger
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 17:20 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On Fri, 20 Jul 2018 20:14:43 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> >Casting away the const on the buf variable is going to cause warnings
> >and should not be necessary.
> >  
> 
> It doesn't when it's casted like that, the new line is changed to null byte so we need to drop 
> the const.

Then change store function to take a char *?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-20 17:20         ` [Bridge] " Stephen Hemminger
@ 2018-07-20 17:26           ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 17:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, anuradhak, bridge, wkok, davem

On July 20, 2018 8:20:44 PM GMT+03:00, Stephen Hemminger <stephen@networkplumber.org> wrote:
>On Fri, 20 Jul 2018 20:14:43 +0300
>Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>
>> >Casting away the const on the buf variable is going to cause
>warnings
>> >and should not be necessary.
>> >  
>> 
>> It doesn't when it's casted like that, the new line is changed to
>null byte so we need to drop 
>> the const.
>
>Then change store function to take a char *?

Sure, will do.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-20 17:26           ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 17:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On July 20, 2018 8:20:44 PM GMT+03:00, Stephen Hemminger <stephen@networkplumber.org> wrote:
>On Fri, 20 Jul 2018 20:14:43 +0300
>Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>
>> >Casting away the const on the buf variable is going to cause
>warnings
>> >and should not be necessary.
>> >  
>> 
>> It doesn't when it's casted like that, the new line is changed to
>null byte so we need to drop 
>> the const.
>
>Then change store function to take a char *?

Sure, will do.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-20 17:26           ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-20 17:47             ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 17:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On July 20, 2018 8:26:36 PM GMT+03:00, Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>On July 20, 2018 8:20:44 PM GMT+03:00, Stephen Hemminger
><stephen@networkplumber.org> wrote:
>>On Fri, 20 Jul 2018 20:14:43 +0300
>>Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>>
>>> >Casting away the const on the buf variable is going to cause
>>warnings
>>> >and should not be necessary.
>>> >  
>>> 
>>> It doesn't when it's casted like that, the new line is changed to
>>null byte so we need to drop 
>>> the const.
>>
>>Then change store function to take a char *?
>
>Sure, will do.

Acrually I can't change sysfs_ops store prototype. 
That is the reason the bond also casts the ptr.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-20 17:47             ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-20 17:47 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On July 20, 2018 8:26:36 PM GMT+03:00, Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>On July 20, 2018 8:20:44 PM GMT+03:00, Stephen Hemminger
><stephen@networkplumber.org> wrote:
>>On Fri, 20 Jul 2018 20:14:43 +0300
>>Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
>>
>>> >Casting away the const on the buf variable is going to cause
>>warnings
>>> >and should not be necessary.
>>> >  
>>> 
>>> It doesn't when it's casted like that, the new line is changed to
>>null byte so we need to drop 
>>> the const.
>>
>>Then change store function to take a char *?
>
>Sure, will do.

Acrually I can't change sysfs_ops store prototype. 
That is the reason the bond also casts the ptr.




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-20 17:47             ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-20 21:14               ` Stephen Hemminger
  -1 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 21:14 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, anuradhak, bridge, wkok, davem

On Fri, 20 Jul 2018 20:47:26 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> On July 20, 2018 8:26:36 PM GMT+03:00, Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> >On July 20, 2018 8:20:44 PM GMT+03:00, Stephen Hemminger
> ><stephen@networkplumber.org> wrote:  
> >>On Fri, 20 Jul 2018 20:14:43 +0300
> >>Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> >>  
> >>> >Casting away the const on the buf variable is going to cause  
> >>warnings  
> >>> >and should not be necessary.
> >>> >    
> >>> 
> >>> It doesn't when it's casted like that, the new line is changed to  
> >>null byte so we need to drop   
> >>> the const.  
> >>
> >>Then change store function to take a char *?  
> >
> >Sure, will do.  
> 
> Acrually I can't change sysfs_ops store prototype. 
> That is the reason the bond also casts the ptr.
> 
> 
> 

Ok.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-20 21:14               ` Stephen Hemminger
  0 siblings, 0 replies; 34+ messages in thread
From: Stephen Hemminger @ 2018-07-20 21:14 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, roopa, bridge, wkok, anuradhak, davem

On Fri, 20 Jul 2018 20:47:26 +0300
Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:

> On July 20, 2018 8:26:36 PM GMT+03:00, Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> >On July 20, 2018 8:20:44 PM GMT+03:00, Stephen Hemminger
> ><stephen@networkplumber.org> wrote:  
> >>On Fri, 20 Jul 2018 20:14:43 +0300
> >>Nikolay Aleksandrov <nikolay@cumulusnetworks.com> wrote:
> >>  
> >>> >Casting away the const on the buf variable is going to cause  
> >>warnings  
> >>> >and should not be necessary.
> >>> >    
> >>> 
> >>> It doesn't when it's casted like that, the new line is changed to  
> >>null byte so we need to drop   
> >>> the const.  
> >>
> >>Then change store function to take a char *?  
> >
> >Sure, will do.  
> 
> Acrually I can't change sysfs_ops store prototype. 
> That is the reason the bond also casts the ptr.
> 
> 
> 

Ok.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-20 14:48   ` [Bridge] " Nikolay Aleksandrov
@ 2018-07-22  6:27     ` David Miller
  -1 siblings, 0 replies; 34+ messages in thread
From: David Miller @ 2018-07-22  6:27 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, roopa, bridge, wkok, anuradhak

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Fri, 20 Jul 2018 17:48:25 +0300

> +		spin_lock_bh(&p->br->lock);
> +		ret = brport_attr->store_raw(p, (char *)buf);
> +		spin_unlock_bh(&p->br->lock);

Please respect the const here.

Have the methods do a kstrncup(); ... kfree(); sequence if they have
to mangle the contents when there is a newline inside.

I know the caller is passing in what was a non-const char pointer,
I've looked at the implementation, but it might be that way forever.

I looked at all other sysfs writes that need to do this \n --> \0
mangling and they either copy the string first into a static buffer
or they do the kstrncup() thing.

Thank you.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-22  6:27     ` David Miller
  0 siblings, 0 replies; 34+ messages in thread
From: David Miller @ 2018-07-22  6:27 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, roopa, bridge, wkok, anuradhak

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Fri, 20 Jul 2018 17:48:25 +0300

> +		spin_lock_bh(&p->br->lock);
> +		ret = brport_attr->store_raw(p, (char *)buf);
> +		spin_unlock_bh(&p->br->lock);

Please respect the const here.

Have the methods do a kstrncup(); ... kfree(); sequence if they have
to mangle the contents when there is a newline inside.

I know the caller is passing in what was a non-const char pointer,
I've looked at the implementation, but it might be that way forever.

I looked at all other sysfs writes that need to do this \n --> \0
mangling and they either copy the string first into a static buffer
or they do the kstrncup() thing.

Thank you.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
  2018-07-22  6:27     ` [Bridge] " David Miller
@ 2018-07-22  7:41       ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-22  7:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, roopa, bridge, wkok, anuradhak

On 22/07/18 09:27, David Miller wrote:
> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Date: Fri, 20 Jul 2018 17:48:25 +0300
> 
>> +		spin_lock_bh(&p->br->lock);
>> +		ret = brport_attr->store_raw(p, (char *)buf);
>> +		spin_unlock_bh(&p->br->lock);
> 
> Please respect the const here.
> 
> Have the methods do a kstrncup(); ... kfree(); sequence if they have
> to mangle the contents when there is a newline inside.
> 
> I know the caller is passing in what was a non-const char pointer,
> I've looked at the implementation, but it might be that way forever.
> 
> I looked at all other sysfs writes that need to do this \n --> \0
> mangling and they either copy the string first into a static buffer
> or they do the kstrncup() thing.
> 
> Thank you.
> 

Okay, will do. I'll also prepare a separate patch for the bonding, 
it does a cast like this in bonding_sysfs_store_option().

Thanks

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options
@ 2018-07-22  7:41       ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-22  7:41 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, roopa, bridge, wkok, anuradhak

On 22/07/18 09:27, David Miller wrote:
> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> Date: Fri, 20 Jul 2018 17:48:25 +0300
> 
>> +		spin_lock_bh(&p->br->lock);
>> +		ret = brport_attr->store_raw(p, (char *)buf);
>> +		spin_unlock_bh(&p->br->lock);
> 
> Please respect the const here.
> 
> Have the methods do a kstrncup(); ... kfree(); sequence if they have
> to mangle the contents when there is a newline inside.
> 
> I know the caller is passing in what was a non-const char pointer,
> I've looked at the implementation, but it might be that way forever.
> 
> I looked at all other sysfs writes that need to do this \n --> \0
> mangling and they either copy the string first into a static buffer
> or they do the kstrncup() thing.
> 
> Thank you.
> 

Okay, will do. I'll also prepare a separate patch for the bonding, 
it does a cast like this in bonding_sysfs_store_option().

Thanks

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 2/2] net: bridge: add support for backup port
  2018-07-20 16:41       ` [Bridge] " Roopa Prabhu
@ 2018-07-23  6:15         ` Toshiaki Makita
  -1 siblings, 0 replies; 34+ messages in thread
From: Toshiaki Makita @ 2018-07-23  6:15 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov
  Cc: Stephen Hemminger, netdev, Anuradha Karuppiah, bridge,
	Wilson Kok, David Miller

On 2018/07/21 1:41, Roopa Prabhu wrote:
> On Fri, Jul 20, 2018 at 9:02 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> Trying to understand this.
>>
>> Is it the case that what you are trying to solve is the way MLAG
>> and bridging interact on the Linux side or more a limitation of how
>> switches operate?  Wouldn't this work?
> 
> not a limitation. Its the way MLAG works on the switch side
> 
>>
>>                 br0 -- team0 -- eth1
>>                              +- eth2
>>
>> The bridge would only have fdb entries for the team device.
>> Why do eth1 and eth2 have to be master devices?  Why would eth1
>> and eth2 need to be bridge ports.
> 
> 
> Two switches acting in a MLAG pair are connected by the peerlink
> interface which is a bridge port.
> 
> the config on one of the switches looks like the below. The other
> switch also has a similar config.
> eth0 is connected to one port on the server. And the server is
> connected to both switches.
> 
> 
> br0 -- team0---eth0
>       |
>       -- switch-peerlink
> 
> switch-peerlink becomes the failover/backport port when say team0 to
> the server goes down.

I feel like this kind of diagram in commitlog would help us understand
what you/Nikolay want to do. I was also not able to get why team/bonding
is not an option reading commitlog. (Now I think I understand it thanks
to Roopa's explanation.)

-- 
Toshiaki Makita

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 2/2] net: bridge: add support for backup port
@ 2018-07-23  6:15         ` Toshiaki Makita
  0 siblings, 0 replies; 34+ messages in thread
From: Toshiaki Makita @ 2018-07-23  6:15 UTC (permalink / raw)
  To: Roopa Prabhu, Nikolay Aleksandrov
  Cc: netdev, bridge, Wilson Kok, Anuradha Karuppiah, David Miller

On 2018/07/21 1:41, Roopa Prabhu wrote:
> On Fri, Jul 20, 2018 at 9:02 AM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> Trying to understand this.
>>
>> Is it the case that what you are trying to solve is the way MLAG
>> and bridging interact on the Linux side or more a limitation of how
>> switches operate?  Wouldn't this work?
> 
> not a limitation. Its the way MLAG works on the switch side
> 
>>
>>                 br0 -- team0 -- eth1
>>                              +- eth2
>>
>> The bridge would only have fdb entries for the team device.
>> Why do eth1 and eth2 have to be master devices?  Why would eth1
>> and eth2 need to be bridge ports.
> 
> 
> Two switches acting in a MLAG pair are connected by the peerlink
> interface which is a bridge port.
> 
> the config on one of the switches looks like the below. The other
> switch also has a similar config.
> eth0 is connected to one port on the server. And the server is
> connected to both switches.
> 
> 
> br0 -- team0---eth0
>       |
>       -- switch-peerlink
> 
> switch-peerlink becomes the failover/backport port when say team0 to
> the server goes down.

I feel like this kind of diagram in commitlog would help us understand
what you/Nikolay want to do. I was also not able to get why team/bonding
is not an option reading commitlog. (Now I think I understand it thanks
to Roopa's explanation.)

-- 
Toshiaki Makita


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next 2/2] net: bridge: add support for backup port
  2018-07-23  6:15         ` [Bridge] " Toshiaki Makita
@ 2018-07-23  8:03           ` Nikolay Aleksandrov
  -1 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-23  8:03 UTC (permalink / raw)
  To: Toshiaki Makita, Roopa Prabhu
  Cc: netdev, bridge, Wilson Kok, Anuradha Karuppiah, David Miller

On 23/07/18 09:15, Toshiaki Makita wrote:
> On 2018/07/21 1:41, Roopa Prabhu wrote:
>> On Fri, Jul 20, 2018 at 9:02 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> Trying to understand this.
>>>
>>> Is it the case that what you are trying to solve is the way MLAG
>>> and bridging interact on the Linux side or more a limitation of how
>>> switches operate?  Wouldn't this work?
>>
>> not a limitation. Its the way MLAG works on the switch side
>>
>>>
>>>                 br0 -- team0 -- eth1
>>>                              +- eth2
>>>
>>> The bridge would only have fdb entries for the team device.
>>> Why do eth1 and eth2 have to be master devices?  Why would eth1
>>> and eth2 need to be bridge ports.
>>
>>
>> Two switches acting in a MLAG pair are connected by the peerlink
>> interface which is a bridge port.
>>
>> the config on one of the switches looks like the below. The other
>> switch also has a similar config.
>> eth0 is connected to one port on the server. And the server is
>> connected to both switches.
>>
>>
>> br0 -- team0---eth0
>>       |
>>       -- switch-peerlink
>>
>> switch-peerlink becomes the failover/backport port when say team0 to
>> the server goes down.
> 
> I feel like this kind of diagram in commitlog would help us understand
> what you/Nikolay want to do. I was also not able to get why team/bonding
> is not an option reading commitlog. (Now I think I understand it thanks
> to Roopa's explanation.)
> 

Fair enough, I'll send v3 with Roopa's explanation and diagram added.

Thanks,
 Nik

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next 2/2] net: bridge: add support for backup port
@ 2018-07-23  8:03           ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2018-07-23  8:03 UTC (permalink / raw)
  To: Toshiaki Makita, Roopa Prabhu
  Cc: netdev, bridge, Wilson Kok, Anuradha Karuppiah, David Miller

On 23/07/18 09:15, Toshiaki Makita wrote:
> On 2018/07/21 1:41, Roopa Prabhu wrote:
>> On Fri, Jul 20, 2018 at 9:02 AM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> Trying to understand this.
>>>
>>> Is it the case that what you are trying to solve is the way MLAG
>>> and bridging interact on the Linux side or more a limitation of how
>>> switches operate?  Wouldn't this work?
>>
>> not a limitation. Its the way MLAG works on the switch side
>>
>>>
>>>                 br0 -- team0 -- eth1
>>>                              +- eth2
>>>
>>> The bridge would only have fdb entries for the team device.
>>> Why do eth1 and eth2 have to be master devices?  Why would eth1
>>> and eth2 need to be bridge ports.
>>
>>
>> Two switches acting in a MLAG pair are connected by the peerlink
>> interface which is a bridge port.
>>
>> the config on one of the switches looks like the below. The other
>> switch also has a similar config.
>> eth0 is connected to one port on the server. And the server is
>> connected to both switches.
>>
>>
>> br0 -- team0---eth0
>>       |
>>       -- switch-peerlink
>>
>> switch-peerlink becomes the failover/backport port when say team0 to
>> the server goes down.
> 
> I feel like this kind of diagram in commitlog would help us understand
> what you/Nikolay want to do. I was also not able to get why team/bonding
> is not an option reading commitlog. (Now I think I understand it thanks
> to Roopa's explanation.)
> 

Fair enough, I'll send v3 with Roopa's explanation and diagram added.

Thanks,
 Nik


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-07-23  8:03 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-20 14:48 [PATCH net-next 0/2] net: bridge: add support for backup port Nikolay Aleksandrov
2018-07-20 14:48 ` [Bridge] " Nikolay Aleksandrov
2018-07-20 14:48 ` [PATCH net-next 1/2] net: bridge: add support for raw sysfs port options Nikolay Aleksandrov
2018-07-20 14:48   ` [Bridge] " Nikolay Aleksandrov
2018-07-20 15:57   ` Stephen Hemminger
2018-07-20 15:57     ` [Bridge] " Stephen Hemminger
2018-07-20 17:14     ` Nikolay Aleksandrov
2018-07-20 17:14       ` [Bridge] " Nikolay Aleksandrov
2018-07-20 17:20       ` Stephen Hemminger
2018-07-20 17:20         ` [Bridge] " Stephen Hemminger
2018-07-20 17:26         ` Nikolay Aleksandrov
2018-07-20 17:26           ` [Bridge] " Nikolay Aleksandrov
2018-07-20 17:47           ` Nikolay Aleksandrov
2018-07-20 17:47             ` [Bridge] " Nikolay Aleksandrov
2018-07-20 21:14             ` Stephen Hemminger
2018-07-20 21:14               ` [Bridge] " Stephen Hemminger
2018-07-22  6:27   ` David Miller
2018-07-22  6:27     ` [Bridge] " David Miller
2018-07-22  7:41     ` Nikolay Aleksandrov
2018-07-22  7:41       ` [Bridge] " Nikolay Aleksandrov
2018-07-20 14:48 ` [PATCH net-next 2/2] net: bridge: add support for backup port Nikolay Aleksandrov
2018-07-20 14:48   ` [Bridge] " Nikolay Aleksandrov
2018-07-20 16:02   ` Stephen Hemminger
2018-07-20 16:02     ` [Bridge] " Stephen Hemminger
2018-07-20 16:41     ` Roopa Prabhu
2018-07-20 16:41       ` [Bridge] " Roopa Prabhu
2018-07-23  6:15       ` Toshiaki Makita
2018-07-23  6:15         ` [Bridge] " Toshiaki Makita
2018-07-23  8:03         ` Nikolay Aleksandrov
2018-07-23  8:03           ` [Bridge] " Nikolay Aleksandrov
2018-07-20 15:22 ` [PATCH net-next 0/2] " Stephen Hemminger
2018-07-20 15:22   ` [Bridge] " Stephen Hemminger
2018-07-20 15:26   ` Nikolay Aleksandrov
2018-07-20 15:26     ` [Bridge] " Nikolay Aleksandrov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.