netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] net: dsa: link aggregation support
@ 2020-10-27 10:51 Tobias Waldekranz
  2020-10-27 10:51 ` [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X Tobias Waldekranz
                   ` (9 more replies)
  0 siblings, 10 replies; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 10:51 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv; +Cc: netdev

This series starts by adding the generic support required to offload
link aggregates to drivers built on top of the DSA subsystem. It then
implements offloading for the mv88e6xxx driver, i.e. Marvell's
LinkStreet family.

Posting this as an RFC as there are some topics that I would like
feedback on before going further with testing. Thus far I've done some
basic tests to verify that:

- A LAG can be used as a regular interface.
- Bridging a LAG with other DSA ports works as expected.
- Load balancing is done by the hardware, both in single- and
  multi-chip systems.
- Load balancing is dynamically reconfigured when the state of
  individual links change.

Testing as been done on two systems:

1. Single-chip system with one Peridot (6390X).
2. Multi-chip system with one Agate (6352) daisy-chained with an Opal
   (6097F).

I would really appreciate feedback on the following:

All LAG configuration is cached in `struct dsa_lag`s. I realize that
the standard M.O. of DSA is to read back information from hardware
when required. With LAGs this becomes very tricky though. For example,
the change of a link state on one switch will require re-balancing of
LAG hash buckets on another one, which in turn depends on the total
number of active links in the LAG. Do you agree that this is
motivated?

The LAG driver ops all receive the LAG netdev as an argument when this
information is already available through the port's lag pointer. This
was done to match the way that the bridge netdev is passed to all VLAN
ops even though it is in the port's bridge_dev. Is there a reason for
this or should I just remove it from the LAG ops?

At least on mv88e6xxx, the exact source port is not available when
packets are received on the CPU. The way I see it, there are two ways
around that problem:

- Inject the packet directly on the LAG device (what this series
  does). Feels right because it matches all that we actually know; the
  packet came in on the LAG. It does complicate dsa_switch_rcv
  somewhat as we can no longer assume that skb->dev is a DSA port.

- Inject the packet on "the designated port", i.e. some port in the
  LAG. This lets us keep the current Rx path untouched. The problem is
  that (a) the port would have to be dynamically updated to match the
  expectations of the LAG driver (team/bond) as links are
  enabled/disabled and (b) we would be presenting a lie because
  packets would appear to ingress on netdevs that they might not in
  fact have been physically received on.

(mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
seems like all chips capable of doing EDSA are using that, except for
the Peridot.

(mv88e6xxx) The cross-chip PVT changes required to allow a LAG to
communicate with the other ports do not feel quite right, but I'm
unsure about what the proper way of doing it would be. Any ideas?

(mv88e6xxx) Marvell has historically used the idiosyncratic term
"trunk" to refer to link aggregates. Somewhere around the Peridot they
have switched and are now referring to the same registers/tables using
the term "LAG". In this series I've stuck to using LAG for all generic
stuff, and only used trunk for driver-internal functions. Do we want
to rename everything to use the LAG nomenclature?

Thanks,
Tobias

Tobias Waldekranz (4):
  net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X
  net: dsa: link aggregation support
  net: dsa: mv88e6xxx: link aggregation support
  net: dsa: tag_edsa: support reception of packets from lag devices

 drivers/net/dsa/mv88e6xxx/chip.c    | 232 +++++++++++++++++++++++++++-
 drivers/net/dsa/mv88e6xxx/chip.h    |   4 +
 drivers/net/dsa/mv88e6xxx/global2.c |   8 +-
 drivers/net/dsa/mv88e6xxx/global2.h |   5 +
 drivers/net/dsa/mv88e6xxx/port.c    |  21 +++
 drivers/net/dsa/mv88e6xxx/port.h    |   5 +
 include/net/dsa.h                   |  68 ++++++++
 net/dsa/dsa.c                       |  23 +--
 net/dsa/dsa2.c                      |   3 +
 net/dsa/dsa_priv.h                  |  16 ++
 net/dsa/port.c                      | 146 +++++++++++++++++
 net/dsa/slave.c                     |  53 ++++++-
 net/dsa/switch.c                    |  64 ++++++++
 net/dsa/tag_edsa.c                  |  12 +-
 14 files changed, 635 insertions(+), 25 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 43+ messages in thread

* [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
@ 2020-10-27 10:51 ` Tobias Waldekranz
  2020-10-27 14:52   ` Marek Behun
  2020-10-27 10:51 ` [RFC PATCH 2/4] net: dsa: link aggregation support Tobias Waldekranz
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 10:51 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv; +Cc: netdev

The policy is to use ethertyped DSA for all devices that are capable
of doing so, which the Peridot is.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index bd297ae7cf9e..536ee6cff779 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -5100,7 +5100,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		.atu_move_port_mask = 0x1f,
 		.pvt = true,
 		.multi_chip = true,
-		.tag_protocol = DSA_TAG_PROTO_DSA,
+		.tag_protocol = DSA_TAG_PROTO_EDSA,
 		.ptp_support = true,
 		.ops = &mv88e6390_ops,
 	},
@@ -5124,7 +5124,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		.atu_move_port_mask = 0x1f,
 		.pvt = true,
 		.multi_chip = true,
-		.tag_protocol = DSA_TAG_PROTO_DSA,
+		.tag_protocol = DSA_TAG_PROTO_EDSA,
 		.ptp_support = true,
 		.ops = &mv88e6390x_ops,
 	},
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 2/4] net: dsa: link aggregation support
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
  2020-10-27 10:51 ` [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X Tobias Waldekranz
@ 2020-10-27 10:51 ` Tobias Waldekranz
  2020-10-28  0:58   ` Vladimir Oltean
  2020-10-27 10:51 ` [RFC PATCH 3/4] net: dsa: mv88e6xxx: " Tobias Waldekranz
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 10:51 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv; +Cc: netdev

Monitor the following events and notify the driver when:

- A DSA port joins/leaves a LAG.
- A LAG, made up of DSA ports, joins/leaves a bridge.
- A DSA port in a LAG is enabled/disabled (enabled meaning
  "distributing" in 802.3ad LACP terms).

Each LAG interface to which a DSA port is attached is represented by a
`struct dsa_lag` which is globally reachable from the switch tree and
from each associated port.

When a LAG joins a bridge, the DSA subsystem will treat that as each
individual port joining the bridge. The driver may look at the port's
LAG pointer to see if it is associated with any LAG, if that is
required. This is analogue to how switchdev events are replicated out
to all lower devices when reaching e.g. a LAG.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 include/net/dsa.h  |  68 +++++++++++++++++++++
 net/dsa/dsa2.c     |   3 +
 net/dsa/dsa_priv.h |  16 +++++
 net/dsa/port.c     | 146 +++++++++++++++++++++++++++++++++++++++++++++
 net/dsa/slave.c    |  53 ++++++++++++++--
 net/dsa/switch.c   |  64 ++++++++++++++++++++
 6 files changed, 346 insertions(+), 4 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 35429a140dfa..58d73eafe891 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -145,6 +145,9 @@ struct dsa_switch_tree {
 	/* List of switch ports */
 	struct list_head ports;
 
+	/* List of configured LAGs */
+	struct list_head lags;
+
 	/* List of DSA links composing the routing table */
 	struct list_head rtable;
 };
@@ -178,6 +181,48 @@ struct dsa_mall_tc_entry {
 	};
 };
 
+struct dsa_lag {
+	struct net_device *dev;
+	int id;
+
+	struct list_head ports;
+
+	/* For multichip systems, we must ensure that each hash bucket
+	 * is only enabled on a single egress port throughout the
+	 * whole tree. We must maintain a global list of active tx
+	 * ports, so that each switch can figure out which buckets to
+	 * enable on which ports.
+	 */
+	struct list_head tx_ports;
+	int num_tx;
+
+	struct kref refcount;
+	struct list_head list;
+};
+
+static inline struct dsa_lag *dsa_lag_by_dev(struct dsa_switch_tree *dst,
+					     struct net_device *dev)
+{
+	struct dsa_lag *lag;
+
+	list_for_each_entry(lag, &dst->lags, list)
+		if (lag->dev == dev)
+			return lag;
+
+	return NULL;
+}
+
+static inline struct net_device *dsa_lag_dev_by_id(struct dsa_switch_tree *dst,
+						   int id)
+{
+	struct dsa_lag *lag;
+
+	list_for_each_entry_rcu(lag, &dst->lags, list)
+		if (lag->id == id)
+			return lag->dev;
+
+	return NULL;
+}
 
 struct dsa_port {
 	/* A CPU port is physically connected to a master device.
@@ -218,6 +263,9 @@ struct dsa_port {
 	bool			devlink_port_setup;
 	struct phylink		*pl;
 	struct phylink_config	pl_config;
+	struct dsa_lag		*lag;
+	struct list_head	lag_list;
+	struct list_head	lag_tx_list;
 
 	struct list_head list;
 
@@ -616,6 +664,16 @@ struct dsa_switch_ops {
 	void	(*crosschip_bridge_leave)(struct dsa_switch *ds, int tree_index,
 					  int sw_index, int port,
 					  struct net_device *br);
+	int	(*crosschip_lag_change)(struct dsa_switch *ds, int tree_index,
+					int sw_index, int port,
+					struct net_device *lag_dev,
+					struct netdev_lag_lower_state_info *info);
+	int	(*crosschip_lag_join)(struct dsa_switch *ds, int tree_index,
+				      int sw_index, int port,
+				      struct net_device *lag_dev);
+	void	(*crosschip_lag_leave)(struct dsa_switch *ds, int tree_index,
+				       int sw_index, int port,
+				       struct net_device *lag_dev);
 
 	/*
 	 * PTP functionality
@@ -647,6 +705,16 @@ struct dsa_switch_ops {
 	int	(*port_change_mtu)(struct dsa_switch *ds, int port,
 				   int new_mtu);
 	int	(*port_max_mtu)(struct dsa_switch *ds, int port);
+
+	/*
+	 * LAG integration
+	 */
+	int	(*port_lag_change)(struct dsa_switch *ds, int port,
+				   struct netdev_lag_lower_state_info *info);
+	int	(*port_lag_join)(struct dsa_switch *ds, int port,
+				 struct net_device *lag_dev);
+	void	(*port_lag_leave)(struct dsa_switch *ds, int port,
+				  struct net_device *lag_dev);
 };
 
 #define DSA_DEVLINK_PARAM_DRIVER(_id, _name, _type, _cmodes)		\
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 183003e45762..708d5a34e150 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -66,6 +66,7 @@ static struct dsa_switch_tree *dsa_tree_alloc(int index)
 	INIT_LIST_HEAD(&dst->rtable);
 
 	INIT_LIST_HEAD(&dst->ports);
+	INIT_LIST_HEAD(&dst->lags);
 
 	INIT_LIST_HEAD(&dst->list);
 	list_add_tail(&dst->list, &dsa_tree_list);
@@ -659,6 +660,8 @@ static struct dsa_port *dsa_port_touch(struct dsa_switch *ds, int index)
 	dp->index = index;
 
 	INIT_LIST_HEAD(&dp->list);
+	INIT_LIST_HEAD(&dp->lag_list);
+	INIT_LIST_HEAD(&dp->lag_tx_list);
 	list_add_tail(&dp->list, &dst->ports);
 
 	return dp;
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 12998bf04e55..341feee3eae5 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -20,6 +20,9 @@ enum {
 	DSA_NOTIFIER_BRIDGE_LEAVE,
 	DSA_NOTIFIER_FDB_ADD,
 	DSA_NOTIFIER_FDB_DEL,
+	DSA_NOTIFIER_LAG_CHANGE,
+	DSA_NOTIFIER_LAG_JOIN,
+	DSA_NOTIFIER_LAG_LEAVE,
 	DSA_NOTIFIER_MDB_ADD,
 	DSA_NOTIFIER_MDB_DEL,
 	DSA_NOTIFIER_VLAN_ADD,
@@ -57,6 +60,15 @@ struct dsa_notifier_mdb_info {
 	int port;
 };
 
+/* DSA_NOTIFIER_LAG_* */
+struct dsa_notifier_lag_info {
+	struct netdev_lag_lower_state_info *info;
+	struct net_device *lag;
+	int tree_index;
+	int sw_index;
+	int port;
+};
+
 /* DSA_NOTIFIER_VLAN_* */
 struct dsa_notifier_vlan_info {
 	const struct switchdev_obj_port_vlan *vlan;
@@ -137,6 +149,10 @@ void dsa_port_disable_rt(struct dsa_port *dp);
 void dsa_port_disable(struct dsa_port *dp);
 int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br);
 void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br);
+int dsa_port_lag_change(struct dsa_port *dp,
+			struct netdev_lag_lower_state_info *linfo);
+int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag_dev);
+void dsa_port_lag_leave(struct dsa_port *dp, struct net_device *lag_dev);
 int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
 			    struct switchdev_trans *trans);
 bool dsa_port_skip_vlan_configuration(struct dsa_port *dp);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 73569c9af3cc..e87fc4765497 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -193,6 +193,152 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
 	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
 }
 
+static struct dsa_lag *dsa_lag_get(struct dsa_switch_tree *dst,
+				   struct net_device *dev)
+{
+	struct dsa_lag *lag;
+	unsigned long busy = 0;
+	int id;
+
+	list_for_each_entry(lag, &dst->lags, list) {
+		set_bit(lag->id, &busy);
+
+		if (lag->dev == dev) {
+			kref_get(&lag->refcount);
+			return lag;
+		}
+	}
+
+	id = find_first_zero_bit(&busy, BITS_PER_LONG);
+	if (id >= BITS_PER_LONG)
+		return ERR_PTR(-ENOSPC);
+
+	lag = kzalloc(sizeof(*lag), GFP_KERNEL);
+	if (!lag)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&lag->refcount);
+	lag->id = id;
+	lag->dev = dev;
+	INIT_LIST_HEAD(&lag->ports);
+	INIT_LIST_HEAD(&lag->tx_ports);
+
+	INIT_LIST_HEAD(&lag->list);
+	list_add_tail_rcu(&lag->list, &dst->lags);
+	return lag;
+}
+
+static void dsa_lag_release(struct kref *refcount)
+{
+	struct dsa_lag *lag = container_of(refcount, struct dsa_lag, refcount);
+
+	list_del_rcu(&lag->list);
+	synchronize_rcu();
+	kfree(lag);
+}
+
+static void dsa_lag_put(struct dsa_lag *lag)
+{
+	kref_put(&lag->refcount, dsa_lag_release);
+}
+
+int dsa_port_lag_change(struct dsa_port *dp,
+			struct netdev_lag_lower_state_info *linfo)
+{
+	struct dsa_notifier_lag_info info = {
+		.tree_index = dp->ds->dst->index,
+		.sw_index = dp->ds->index,
+		.port = dp->index,
+		.info = linfo,
+	};
+	bool old, new;
+
+	if (!dp->lag)
+		return 0;
+
+	info.lag = dp->lag->dev;
+
+	/* If this port is on the tx list, it is already enabled. */
+	old = !list_empty(&dp->lag_tx_list);
+
+	/* On statically configured aggregates (e.g. loadbalance
+	 * without LACP) ports will always be tx_enabled, even if the
+	 * link is down. Thus we require both link_up and tx_enabled
+	 * in order to include it in the tx set.
+	 */
+	new = linfo->link_up && linfo->tx_enabled;
+
+	if (new == old)
+		return 0;
+
+	if (new) {
+		dp->lag->num_tx++;
+		list_add_tail(&dp->lag_tx_list, &dp->lag->tx_ports);
+	} else {
+		list_del_init(&dp->lag_tx_list);
+		dp->lag->num_tx--;
+	}
+
+	return dsa_broadcast(DSA_NOTIFIER_LAG_CHANGE, &info);
+}
+
+int dsa_port_lag_join(struct dsa_port *dp, struct net_device *lag_dev)
+{
+	struct dsa_notifier_lag_info info = {
+		.tree_index = dp->ds->dst->index,
+		.sw_index = dp->ds->index,
+		.port = dp->index,
+		.lag = lag_dev,
+	};
+	struct dsa_lag *lag;
+	int err;
+
+	lag = dsa_lag_get(dp->ds->dst, lag_dev);
+	if (IS_ERR(lag))
+		return PTR_ERR(lag);
+
+	dp->lag = lag;
+	list_add_tail(&dp->lag_list, &lag->ports);
+
+	err = dsa_broadcast(DSA_NOTIFIER_LAG_JOIN, &info);
+	if (err) {
+		dp->lag = NULL;
+		list_del_init(&dp->lag_list);
+		dsa_lag_put(lag);
+	}
+
+	return err;
+}
+
+void dsa_port_lag_leave(struct dsa_port *dp, struct net_device *lag_dev)
+{
+	struct dsa_notifier_lag_info info = {
+		.tree_index = dp->ds->dst->index,
+		.sw_index = dp->ds->index,
+		.port = dp->index,
+		.lag = lag_dev,
+	};
+	int err;
+
+	/* Port might have been part of a LAG that in turn was
+	 * attached to a bridge.
+	 */
+	if (dp->bridge_dev)
+		dsa_port_bridge_leave(dp, dp->bridge_dev);
+
+	list_del_init(&dp->lag_list);
+	list_del_init(&dp->lag_tx_list);
+
+	err = dsa_broadcast(DSA_NOTIFIER_LAG_LEAVE, &info);
+	if (err)
+		pr_err("DSA: failed to notify DSA_NOTIFIER_LAG_LEAVE: %d\n",
+		       err);
+
+	dsa_lag_put(dp->lag);
+
+	dp->lag = NULL;
+}
+
 /* Must be called under rcu_read_lock() */
 static bool dsa_port_can_apply_vlan_filtering(struct dsa_port *dp,
 					      bool vlan_filtering)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 3bc5ca40c9fb..e5e4f3d096c0 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -334,7 +334,8 @@ static int dsa_slave_vlan_add(struct net_device *dev,
 	struct switchdev_obj_port_vlan vlan;
 	int vid, err;
 
-	if (obj->orig_dev != dev)
+	if (!(obj->orig_dev == dev ||
+	      (dp->lag && obj->orig_dev == dp->lag->dev)))
 		return -EOPNOTSUPP;
 
 	if (dsa_port_skip_vlan_configuration(dp))
@@ -421,7 +422,8 @@ static int dsa_slave_vlan_del(struct net_device *dev,
 	struct switchdev_obj_port_vlan *vlan;
 	int vid, err;
 
-	if (obj->orig_dev != dev)
+	if (!(obj->orig_dev == dev ||
+	      (dp->lag && obj->orig_dev == dp->lag->dev)))
 		return -EOPNOTSUPP;
 
 	if (dsa_port_skip_vlan_configuration(dp))
@@ -1911,6 +1913,33 @@ static int dsa_slave_changeupper(struct net_device *dev,
 			dsa_port_bridge_leave(dp, info->upper_dev);
 			err = NOTIFY_OK;
 		}
+	} else if (netif_is_lag_master(info->upper_dev)) {
+		if (info->linking) {
+			err = dsa_port_lag_join(dp, info->upper_dev);
+			err = notifier_from_errno(err);
+		} else {
+			dsa_port_lag_leave(dp, info->upper_dev);
+			err = NOTIFY_OK;
+		}
+	}
+
+	return err;
+}
+
+static int dsa_slave_lag_changeupper(struct net_device *dev,
+				     struct netdev_notifier_changeupper_info *info)
+{
+	struct net_device *lower;
+	struct list_head *iter;
+	int err = NOTIFY_DONE;
+
+	netdev_for_each_lower_dev(dev, lower, iter) {
+		if (!dsa_slave_dev_check(lower))
+			continue;
+
+		err = dsa_slave_changeupper(lower, info);
+		if (notifier_to_errno(err))
+			break;
 	}
 
 	return err;
@@ -1996,10 +2025,26 @@ static int dsa_slave_netdevice_event(struct notifier_block *nb,
 		break;
 	}
 	case NETDEV_CHANGEUPPER:
+		if (dsa_slave_dev_check(dev))
+			return dsa_slave_changeupper(dev, ptr);
+
+		if (netif_is_lag_master(dev))
+			return dsa_slave_lag_changeupper(dev, ptr);
+
+		break;
+	case NETDEV_CHANGELOWERSTATE: {
+		struct netdev_notifier_changelowerstate_info *info = ptr;
+		struct dsa_port *dp;
+		int err;
+
 		if (!dsa_slave_dev_check(dev))
-			return NOTIFY_DONE;
+			break;
 
-		return dsa_slave_changeupper(dev, ptr);
+		dp = dsa_slave_to_port(dev);
+
+		err = dsa_port_lag_change(dp, info->lower_state_info);
+		return notifier_from_errno(err);
+	}
 	}
 
 	return NOTIFY_DONE;
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index 3fb362b6874e..fbf437434e27 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -178,6 +178,61 @@ static int dsa_switch_fdb_del(struct dsa_switch *ds,
 	return ds->ops->port_fdb_del(ds, port, info->addr, info->vid);
 }
 
+static int dsa_switch_lag_change(struct dsa_switch *ds,
+				 struct dsa_notifier_lag_info *info)
+{
+	struct dsa_switch_tree *dst = ds->dst;
+
+	if (dst->index == info->tree_index && ds->index == info->sw_index &&
+	    ds->ops->port_lag_change)
+		return ds->ops->port_lag_change(ds, info->port, info->info);
+
+	if ((dst->index != info->tree_index || ds->index != info->sw_index) &&
+	    ds->ops->crosschip_lag_change)
+		return ds->ops->crosschip_lag_change(ds, info->tree_index,
+						     info->sw_index,
+						     info->port, info->lag,
+						     info->info);
+
+	return 0;
+}
+
+static int dsa_switch_lag_join(struct dsa_switch *ds,
+			       struct dsa_notifier_lag_info *info)
+{
+	struct dsa_switch_tree *dst = ds->dst;
+
+	if (dst->index == info->tree_index && ds->index == info->sw_index &&
+	    ds->ops->port_lag_join)
+		return ds->ops->port_lag_join(ds, info->port, info->lag);
+
+	if ((dst->index != info->tree_index || ds->index != info->sw_index) &&
+	    ds->ops->crosschip_lag_join)
+		return ds->ops->crosschip_lag_join(ds, info->tree_index,
+						   info->sw_index,
+						   info->port, info->lag);
+
+	return 0;
+}
+
+static int dsa_switch_lag_leave(struct dsa_switch *ds,
+				struct dsa_notifier_lag_info *info)
+{
+	struct dsa_switch_tree *dst = ds->dst;
+
+	if (dst->index == info->tree_index && ds->index == info->sw_index &&
+	    ds->ops->port_lag_leave)
+		ds->ops->port_lag_leave(ds, info->port, info->lag);
+
+	if ((dst->index != info->tree_index || ds->index != info->sw_index) &&
+	    ds->ops->crosschip_lag_leave)
+		ds->ops->crosschip_lag_leave(ds, info->tree_index,
+					     info->sw_index,
+					     info->port, info->lag);
+
+	return 0;
+}
+
 static bool dsa_switch_mdb_match(struct dsa_switch *ds, int port,
 				 struct dsa_notifier_mdb_info *info)
 {
@@ -325,6 +380,15 @@ static int dsa_switch_event(struct notifier_block *nb,
 	case DSA_NOTIFIER_FDB_DEL:
 		err = dsa_switch_fdb_del(ds, info);
 		break;
+	case DSA_NOTIFIER_LAG_CHANGE:
+		err = dsa_switch_lag_change(ds, info);
+		break;
+	case DSA_NOTIFIER_LAG_JOIN:
+		err = dsa_switch_lag_join(ds, info);
+		break;
+	case DSA_NOTIFIER_LAG_LEAVE:
+		err = dsa_switch_lag_leave(ds, info);
+		break;
 	case DSA_NOTIFIER_MDB_ADD:
 		err = dsa_switch_mdb_add(ds, info);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 3/4] net: dsa: mv88e6xxx: link aggregation support
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
  2020-10-27 10:51 ` [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X Tobias Waldekranz
  2020-10-27 10:51 ` [RFC PATCH 2/4] net: dsa: link aggregation support Tobias Waldekranz
@ 2020-10-27 10:51 ` Tobias Waldekranz
  2020-10-27 10:51 ` [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices Tobias Waldekranz
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 10:51 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv; +Cc: netdev

Support offloading of LAGs to hardware. LAGs may be attached to a
bridge in which case VLANs, multicast groups, etc. is also offloaded
as usual.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c    | 228 +++++++++++++++++++++++++++-
 drivers/net/dsa/mv88e6xxx/chip.h    |   4 +
 drivers/net/dsa/mv88e6xxx/global2.c |   8 +-
 drivers/net/dsa/mv88e6xxx/global2.h |   5 +
 drivers/net/dsa/mv88e6xxx/port.c    |  21 +++
 drivers/net/dsa/mv88e6xxx/port.h    |   5 +
 6 files changed, 263 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 536ee6cff779..92874d53ba18 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1189,7 +1189,8 @@ static int mv88e6xxx_set_mac_eee(struct dsa_switch *ds, int port,
 }
 
 /* Mask of the local ports allowed to receive frames from a given fabric port */
-static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port)
+static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port,
+			       struct dsa_lag **lag)
 {
 	struct dsa_switch *ds = chip->ds;
 	struct dsa_switch_tree *dst = ds->dst;
@@ -1201,6 +1202,9 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port)
 	list_for_each_entry(dp, &dst->ports, list) {
 		if (dp->ds->index == dev && dp->index == port) {
 			found = true;
+
+			if (dp->lag && lag)
+				*lag = dp->lag;
 			break;
 		}
 	}
@@ -1231,7 +1235,9 @@ static u16 mv88e6xxx_port_vlan(struct mv88e6xxx_chip *chip, int dev, int port)
 
 static int mv88e6xxx_port_vlan_map(struct mv88e6xxx_chip *chip, int port)
 {
-	u16 output_ports = mv88e6xxx_port_vlan(chip, chip->ds->index, port);
+	u16 output_ports;
+
+	output_ports = mv88e6xxx_port_vlan(chip, chip->ds->index, port, NULL);
 
 	/* prevent frames from going back out of the port they came in on */
 	output_ports &= ~BIT(port);
@@ -1389,14 +1395,21 @@ static int mv88e6xxx_mac_setup(struct mv88e6xxx_chip *chip)
 
 static int mv88e6xxx_pvt_map(struct mv88e6xxx_chip *chip, int dev, int port)
 {
+	struct dsa_lag *lag = NULL;
 	u16 pvlan = 0;
 
 	if (!mv88e6xxx_has_pvt(chip))
 		return 0;
 
 	/* Skip the local source device, which uses in-chip port VLAN */
-	if (dev != chip->ds->index)
-		pvlan = mv88e6xxx_port_vlan(chip, dev, port);
+	if (dev != chip->ds->index) {
+		pvlan = mv88e6xxx_port_vlan(chip, dev, port, &lag);
+
+		if (lag) {
+			dev = MV88E6XXX_G2_PVT_ADRR_DEV_TRUNK;
+			port = lag->id;
+		}
+	}
 
 	return mv88e6xxx_g2_pvt_write(chip, dev, port, pvlan);
 }
@@ -5326,6 +5339,207 @@ static int mv88e6xxx_port_egress_floods(struct dsa_switch *ds, int port,
 	return err;
 }
 
+static int mv88e6xxx_lag_sync_map(struct dsa_switch *ds, struct dsa_lag *lag)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+	struct dsa_port *dp;
+	u16 map = 0;
+
+	/* Build the map of all ports to distribute flows destined for
+	 * this LAG. This can be either a local user port, or a DSA
+	 * port if the LAG port is on a remote chip.
+	 */
+	list_for_each_entry(dp, &lag->ports, lag_list) {
+		map |= BIT(dsa_towards_port(ds, dp->ds->index, dp->index));
+	}
+
+	return mv88e6xxx_g2_trunk_mapping_write(chip, lag->id, map);
+}
+
+static const u8 mv88e6xxx_lag_mask_table[8][8] = {
+	/* Row number corresponds to the number of active members in a
+	 * LAG. Each column states which of the eight hash buckets are
+	 * mapped to the column:th port in the LAG.
+	 *
+	 * Example: In a LAG with three active ports, the second port
+	 * ([2][1]) would be selected for traffic mapped to buckets
+	 * 3,4,5 (0x38).
+	 */
+	{ 0xff,    0,    0,    0,    0,    0,    0,    0 },
+	{ 0x0f, 0xf0,    0,    0,    0,    0,    0,    0 },
+	{ 0x07, 0x38, 0xc0,    0,    0,    0,    0,    0 },
+	{ 0x03, 0x0c, 0x30, 0xc0,    0,    0,    0,    0 },
+	{ 0x03, 0x0c, 0x30, 0x40, 0x80,    0,    0,    0 },
+	{ 0x03, 0x0c, 0x10, 0x20, 0x40, 0x80,    0,    0 },
+	{ 0x03, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80,    0 },
+	{ 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 },
+};
+
+static void mv88e6xxx_lag_set_port_mask(u16 *mask, int port,
+					int num_tx, int nth)
+{
+	u8 active = 0;
+	int i;
+
+	num_tx = num_tx <= 8 ? num_tx : 8;
+	if (nth < num_tx)
+		active = mv88e6xxx_lag_mask_table[num_tx - 1][nth];
+
+	for (i = 0; i < 8; i++) {
+		if (BIT(i) & active)
+			mask[i] |= BIT(port);
+	}
+}
+
+static int mv88e6xxx_lag_sync_masks(struct dsa_switch *ds)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+	struct dsa_port *dp;
+	struct dsa_lag *lag;
+	int i, err, nth;
+	u16 mask[8] = { 0 };
+	u16 ivec;
+
+	/* Assume no port is a member of any LAG. */
+	ivec = BIT(mv88e6xxx_num_ports(chip)) - 1;
+
+	/* Disable all masks for ports that _are_ members of a LAG. */
+	list_for_each_entry(lag, &ds->dst->lags, list) {
+		list_for_each_entry(dp, &lag->ports, lag_list) {
+			if (dp->ds != ds)
+				continue;
+
+			ivec &= ~BIT(dp->index);
+		}
+	}
+
+	for (i = 0; i < 8; i++)
+		mask[i] = ivec;
+
+	/* Enable the correct subset of masks for all LAG ports that
+	 * are in the Tx set.
+	 */
+	list_for_each_entry(lag, &ds->dst->lags, list) {
+		if (!lag->num_tx)
+			continue;
+
+		nth = 0;
+		list_for_each_entry(dp, &lag->tx_ports, lag_tx_list) {
+			if (dp->ds == ds)
+				mv88e6xxx_lag_set_port_mask(mask, dp->index,
+							    lag->num_tx, nth);
+
+			nth++;
+		}
+	}
+
+	for (i = 0; i < 8; i++) {
+		err = mv88e6xxx_g2_trunk_mask_write(chip, i, true, mask[i]);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int mv88e6xxx_lag_sync_masks_map(struct dsa_switch *ds,
+					struct dsa_lag *lag)
+{
+	int err;
+
+	err = mv88e6xxx_lag_sync_masks(ds);
+
+	if (!err)
+		err = mv88e6xxx_lag_sync_map(ds, lag);
+
+	return err;
+}
+
+static int mv88e6xxx_port_lag_change(struct dsa_switch *ds, int port,
+				     struct netdev_lag_lower_state_info *info)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+	int err;
+
+	mv88e6xxx_reg_lock(chip);
+	err = mv88e6xxx_lag_sync_masks(ds);
+	mv88e6xxx_reg_unlock(chip);
+	return err;
+}
+
+static int mv88e6xxx_port_lag_join(struct dsa_switch *ds, int port,
+				   struct net_device *lag_dev)
+{
+	struct dsa_lag *lag = dsa_to_port(ds, port)->lag;
+	struct mv88e6xxx_chip *chip = ds->priv;
+	int err;
+
+	mv88e6xxx_reg_lock(chip);
+
+	err = mv88e6xxx_port_set_trunk(chip, port, true, lag->id);
+	if (err)
+		return err;
+
+	err = mv88e6xxx_lag_sync_map(ds, lag);
+	if (err)
+		mv88e6xxx_port_set_trunk(chip, port, false, 0);
+
+	mv88e6xxx_reg_unlock(chip);
+	return err;
+}
+
+static void mv88e6xxx_port_lag_leave(struct dsa_switch *ds, int port,
+				     struct net_device *lag_dev)
+{
+	struct dsa_lag *lag = dsa_to_port(ds, port)->lag;
+	struct mv88e6xxx_chip *chip = ds->priv;
+
+	mv88e6xxx_reg_lock(chip);
+	mv88e6xxx_lag_sync_masks_map(ds, lag);
+	mv88e6xxx_port_set_trunk(chip, port, false, 0);
+	mv88e6xxx_reg_unlock(chip);
+}
+
+static int mv88e6xxx_crosschip_lag_change(struct dsa_switch *ds,
+					  int tree_index, int sw_index,
+					  int port, struct net_device *lag_dev,
+					  struct netdev_lag_lower_state_info *info)
+{
+	struct mv88e6xxx_chip *chip = ds->priv;
+	int err;
+
+	mv88e6xxx_reg_lock(chip);
+	err = mv88e6xxx_lag_sync_masks(ds);
+	mv88e6xxx_reg_unlock(chip);
+	return err;
+}
+
+static int mv88e6xxx_crosschip_lag_join(struct dsa_switch *ds,
+					int tree_index, int sw_index,
+					int port, struct net_device *lag_dev)
+{
+	struct dsa_lag *lag = dsa_lag_by_dev(ds->dst, lag_dev);
+	struct mv88e6xxx_chip *chip = ds->priv;
+	int err;
+
+	mv88e6xxx_reg_lock(chip);
+	err = mv88e6xxx_lag_sync_map(ds, lag);
+	mv88e6xxx_reg_unlock(chip);
+	return err;
+}
+
+static void mv88e6xxx_crosschip_lag_leave(struct dsa_switch *ds,
+					  int tree_index, int sw_index,
+					  int port, struct net_device *lag_dev)
+{
+	struct dsa_lag *lag = dsa_lag_by_dev(ds->dst, lag_dev);
+	struct mv88e6xxx_chip *chip = ds->priv;
+
+	mv88e6xxx_reg_lock(chip);
+	mv88e6xxx_lag_sync_masks_map(ds, lag);
+	mv88e6xxx_reg_unlock(chip);
+}
+
 static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.get_tag_protocol	= mv88e6xxx_get_tag_protocol,
 	.setup			= mv88e6xxx_setup,
@@ -5380,6 +5594,12 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
 	.devlink_param_get	= mv88e6xxx_devlink_param_get,
 	.devlink_param_set	= mv88e6xxx_devlink_param_set,
 	.devlink_info_get	= mv88e6xxx_devlink_info_get,
+	.port_lag_change	= mv88e6xxx_port_lag_change,
+	.port_lag_join		= mv88e6xxx_port_lag_join,
+	.port_lag_leave		= mv88e6xxx_port_lag_leave,
+	.crosschip_lag_change	= mv88e6xxx_crosschip_lag_change,
+	.crosschip_lag_join	= mv88e6xxx_crosschip_lag_join,
+	.crosschip_lag_leave	= mv88e6xxx_crosschip_lag_leave,
 };
 
 static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 81c244fc0419..c460992166f7 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -572,6 +572,10 @@ struct mv88e6xxx_ops {
 
 	/* Max Frame Size */
 	int (*set_max_frame_size)(struct mv88e6xxx_chip *chip, int mtu);
+
+	/* Link aggregation */
+	int (*lag_set_map)(struct mv88e6xxx_chip *chip, struct dsa_lag *lag);
+	int (*lag_set_masks)(struct mv88e6xxx_chip *chip, struct dsa_lag *lag);
 };
 
 struct mv88e6xxx_irq_ops {
diff --git a/drivers/net/dsa/mv88e6xxx/global2.c b/drivers/net/dsa/mv88e6xxx/global2.c
index 75b227d0f73b..da8bac8813e1 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.c
+++ b/drivers/net/dsa/mv88e6xxx/global2.c
@@ -126,8 +126,8 @@ int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip, int target,
 
 /* Offset 0x07: Trunk Mask Table register */
 
-static int mv88e6xxx_g2_trunk_mask_write(struct mv88e6xxx_chip *chip, int num,
-					 bool hash, u16 mask)
+int mv88e6xxx_g2_trunk_mask_write(struct mv88e6xxx_chip *chip, int num,
+				  bool hash, u16 mask)
 {
 	u16 val = (num << 12) | (mask & mv88e6xxx_port_mask(chip));
 
@@ -140,8 +140,8 @@ static int mv88e6xxx_g2_trunk_mask_write(struct mv88e6xxx_chip *chip, int num,
 
 /* Offset 0x08: Trunk Mapping Table register */
 
-static int mv88e6xxx_g2_trunk_mapping_write(struct mv88e6xxx_chip *chip, int id,
-					    u16 map)
+int mv88e6xxx_g2_trunk_mapping_write(struct mv88e6xxx_chip *chip, int id,
+				     u16 map)
 {
 	const u16 port_mask = BIT(mv88e6xxx_num_ports(chip)) - 1;
 	u16 val = (id << 11) | (map & port_mask);
diff --git a/drivers/net/dsa/mv88e6xxx/global2.h b/drivers/net/dsa/mv88e6xxx/global2.h
index 1f42ee656816..60febaf4da76 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.h
+++ b/drivers/net/dsa/mv88e6xxx/global2.h
@@ -101,6 +101,7 @@
 #define MV88E6XXX_G2_PVT_ADDR_OP_WRITE_PVLAN	0x3000
 #define MV88E6XXX_G2_PVT_ADDR_OP_READ		0x4000
 #define MV88E6XXX_G2_PVT_ADDR_PTR_MASK		0x01ff
+#define MV88E6XXX_G2_PVT_ADRR_DEV_TRUNK		0x1f
 
 /* Offset 0x0C: Cross-chip Port VLAN Data Register */
 #define MV88E6XXX_G2_PVT_DATA		0x0c
@@ -345,6 +346,10 @@ int mv88e6352_g2_mgmt_rsvd2cpu(struct mv88e6xxx_chip *chip);
 
 int mv88e6xxx_g2_pot_clear(struct mv88e6xxx_chip *chip);
 
+int mv88e6xxx_g2_trunk_mask_write(struct mv88e6xxx_chip *chip, int num,
+				  bool hash, u16 mask);
+int mv88e6xxx_g2_trunk_mapping_write(struct mv88e6xxx_chip *chip, int id,
+				     u16 map);
 int mv88e6xxx_g2_trunk_clear(struct mv88e6xxx_chip *chip);
 
 int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip, int target,
diff --git a/drivers/net/dsa/mv88e6xxx/port.c b/drivers/net/dsa/mv88e6xxx/port.c
index 8128dc607cf4..7bf5ba55bf81 100644
--- a/drivers/net/dsa/mv88e6xxx/port.c
+++ b/drivers/net/dsa/mv88e6xxx/port.c
@@ -815,6 +815,27 @@ int mv88e6xxx_port_set_message_port(struct mv88e6xxx_chip *chip, int port,
 	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL1, val);
 }
 
+int mv88e6xxx_port_set_trunk(struct mv88e6xxx_chip *chip, int port,
+			     bool trunk, u8 id)
+{
+	u16 val;
+	int err;
+
+	err = mv88e6xxx_port_read(chip, port, MV88E6XXX_PORT_CTL1, &val);
+	if (err)
+		return err;
+
+	val &= ~MV88E6XXX_PORT_CTL1_TRUNK_ID_MASK;
+
+	if (trunk)
+		val |= MV88E6XXX_PORT_CTL1_TRUNK_PORT |
+			(id << MV88E6XXX_PORT_CTL1_TRUNK_ID_SHIFT);
+	else
+		val &= ~MV88E6XXX_PORT_CTL1_TRUNK_PORT;
+
+	return mv88e6xxx_port_write(chip, port, MV88E6XXX_PORT_CTL1, val);
+}
+
 /* Offset 0x06: Port Based VLAN Map */
 
 int mv88e6xxx_port_set_vlan_map(struct mv88e6xxx_chip *chip, int port, u16 map)
diff --git a/drivers/net/dsa/mv88e6xxx/port.h b/drivers/net/dsa/mv88e6xxx/port.h
index 44d76ac973f6..e6a61be7dff9 100644
--- a/drivers/net/dsa/mv88e6xxx/port.h
+++ b/drivers/net/dsa/mv88e6xxx/port.h
@@ -168,6 +168,9 @@
 /* Offset 0x05: Port Control 1 */
 #define MV88E6XXX_PORT_CTL1			0x05
 #define MV88E6XXX_PORT_CTL1_MESSAGE_PORT	0x8000
+#define MV88E6XXX_PORT_CTL1_TRUNK_PORT		0x4000
+#define MV88E6XXX_PORT_CTL1_TRUNK_ID_MASK	0x0f00
+#define MV88E6XXX_PORT_CTL1_TRUNK_ID_SHIFT	8
 #define MV88E6XXX_PORT_CTL1_FID_11_4_MASK	0x00ff
 
 /* Offset 0x06: Port Based VLAN Map */
@@ -348,6 +351,8 @@ int mv88e6351_port_set_ether_type(struct mv88e6xxx_chip *chip, int port,
 				  u16 etype);
 int mv88e6xxx_port_set_message_port(struct mv88e6xxx_chip *chip, int port,
 				    bool message_port);
+int mv88e6xxx_port_set_trunk(struct mv88e6xxx_chip *chip, int port,
+			     bool trunk, u8 id);
 int mv88e6165_port_set_jumbo_size(struct mv88e6xxx_chip *chip, int port,
 				  size_t size);
 int mv88e6095_port_egress_rate_limiting(struct mv88e6xxx_chip *chip, int port);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
                   ` (2 preceding siblings ...)
  2020-10-27 10:51 ` [RFC PATCH 3/4] net: dsa: mv88e6xxx: " Tobias Waldekranz
@ 2020-10-27 10:51 ` Tobias Waldekranz
  2020-10-28 12:05   ` Vladimir Oltean
  2020-10-27 12:27 ` [RFC PATCH 0/4] net: dsa: link aggregation support Vladimir Oltean
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 10:51 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, olteanv; +Cc: netdev

Packets ingressing on a LAG that egress on the CPU port, which are not
classified as management, will have a FORWARD tag that does not
contain the normal source device/port tuple. Instead the trunk bit
will be set, and the port field holds the LAG id.

Since the exact source port information is not available in the tag,
frames are injected directly on the LAG interface and thus do never
pass through any DSA port interface on ingress.

Management frames (TO_CPU) are not affected and will pass through the
DSA port interface as usual.

Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
---
 net/dsa/dsa.c      | 23 +++++++++++++----------
 net/dsa/tag_edsa.c | 12 +++++++++++-
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 2131bf2b3a67..b84e5f0be049 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -220,7 +220,6 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
 	}
 
 	skb = nskb;
-	p = netdev_priv(skb->dev);
 	skb_push(skb, ETH_HLEN);
 	skb->pkt_type = PACKET_HOST;
 	skb->protocol = eth_type_trans(skb, skb->dev);
@@ -234,17 +233,21 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
 		skb = nskb;
 	}
 
-	s = this_cpu_ptr(p->stats64);
-	u64_stats_update_begin(&s->syncp);
-	s->rx_packets++;
-	s->rx_bytes += skb->len;
-	u64_stats_update_end(&s->syncp);
+	if (dsa_slave_dev_check(skb->dev)) {
+		p = netdev_priv(skb->dev);
+		s = this_cpu_ptr(p->stats64);
+		u64_stats_update_begin(&s->syncp);
+		s->rx_packets++;
+		s->rx_bytes += skb->len;
+		u64_stats_update_end(&s->syncp);
 
-	if (dsa_skb_defer_rx_timestamp(p, skb))
-		return 0;
-
-	gro_cells_receive(&p->gcells, skb);
+		if (dsa_skb_defer_rx_timestamp(p, skb))
+			return 0;
 
+		gro_cells_receive(&p->gcells, skb);
+	} else {
+		netif_rx(skb);
+	}
 	return 0;
 }
 
diff --git a/net/dsa/tag_edsa.c b/net/dsa/tag_edsa.c
index 120614240319..800b02f04394 100644
--- a/net/dsa/tag_edsa.c
+++ b/net/dsa/tag_edsa.c
@@ -86,6 +86,7 @@ static struct sk_buff *edsa_xmit(struct sk_buff *skb, struct net_device *dev)
 static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
 				struct packet_type *pt)
 {
+	bool trunk = false;
 	u8 *edsa_header;
 	int frame_type;
 	int code;
@@ -120,6 +121,7 @@ static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
 		break;
 
 	case FRAME_TYPE_FORWARD:
+		trunk = !!(edsa_header[1] & 7);
 		skb->offload_fwd_mark = 1;
 		break;
 
@@ -133,7 +135,15 @@ static struct sk_buff *edsa_rcv(struct sk_buff *skb, struct net_device *dev,
 	source_device = edsa_header[0] & 0x1f;
 	source_port = (edsa_header[1] >> 3) & 0x1f;
 
-	skb->dev = dsa_master_find_slave(dev, source_device, source_port);
+	if (trunk) {
+		struct dsa_port *cpu_dp = dev->dsa_ptr;
+
+		skb->dev = dsa_lag_dev_by_id(cpu_dp->dst, source_port);
+	} else {
+		skb->dev = dsa_master_find_slave(dev, source_device,
+						 source_port);
+	}
+
 	if (!skb->dev)
 		return NULL;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
                   ` (3 preceding siblings ...)
  2020-10-27 10:51 ` [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices Tobias Waldekranz
@ 2020-10-27 12:27 ` Vladimir Oltean
  2020-10-27 14:29   ` Andrew Lunn
  2020-10-27 14:59   ` Tobias Waldekranz
  2020-10-27 14:00 ` Andrew Lunn
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-27 12:27 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, netdev

Hi Tobias,

On Tue, Oct 27, 2020 at 11:51:13AM +0100, Tobias Waldekranz wrote:
> I would really appreciate feedback on the following:
> 
> All LAG configuration is cached in `struct dsa_lag`s. I realize that
> the standard M.O. of DSA is to read back information from hardware
> when required. With LAGs this becomes very tricky though. For example,
> the change of a link state on one switch will require re-balancing of
> LAG hash buckets on another one, which in turn depends on the total
> number of active links in the LAG. Do you agree that this is
> motivated?

I don't really have an issue with that.

> The LAG driver ops all receive the LAG netdev as an argument when this
> information is already available through the port's lag pointer. This
> was done to match the way that the bridge netdev is passed to all VLAN
> ops even though it is in the port's bridge_dev. Is there a reason for
> this or should I just remove it from the LAG ops?

Maybe because on "leave", the bridge/LAG net device pointer inside
struct dsa_port is first set to NULL, then the DSA notifier is called?

> At least on mv88e6xxx, the exact source port is not available when
> packets are received on the CPU. The way I see it, there are two ways
> around that problem:
> 
> - Inject the packet directly on the LAG device (what this series
>   does). Feels right because it matches all that we actually know; the
>   packet came in on the LAG. It does complicate dsa_switch_rcv
>   somewhat as we can no longer assume that skb->dev is a DSA port.
> 
> - Inject the packet on "the designated port", i.e. some port in the
>   LAG. This lets us keep the current Rx path untouched. The problem is
>   that (a) the port would have to be dynamically updated to match the
>   expectations of the LAG driver (team/bond) as links are
>   enabled/disabled and (b) we would be presenting a lie because
>   packets would appear to ingress on netdevs that they might not in
>   fact have been physically received on.

Since ocelot/felix does not have this restriction, and supports
individual port addressing even under a LAG, you can imagine I am not
very happy to see the RX data path punishing everyone else that is not
mv88e6xxx.

> (mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
> seems like all chips capable of doing EDSA are using that, except for
> the Peridot.

I have no documentation whatsoever for mv88e6xxx, but just wondering,
what is the benefit brought by EDSA here vs DSA? Does DSA have the
same restriction when the ports are in a LAG?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
                   ` (4 preceding siblings ...)
  2020-10-27 12:27 ` [RFC PATCH 0/4] net: dsa: link aggregation support Vladimir Oltean
@ 2020-10-27 14:00 ` Andrew Lunn
  2020-10-27 15:09   ` Tobias Waldekranz
  2020-10-27 15:05 ` Marek Behun
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Lunn @ 2020-10-27 14:00 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: vivien.didelot, f.fainelli, olteanv, netdev

> (mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
> seems like all chips capable of doing EDSA are using that, except for
> the Peridot.

Hi Tobias

Marvell removed the ability to use EDSA, in the way we do in Linux
DSA, on Peridot. One of the configuration bits is gone. So i had to
use DSA.  It could be i just don't understand how to configure
Peridot, and EDSA could be used, but i never figured it out.

I will get back to your other questions.

  Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 12:27 ` [RFC PATCH 0/4] net: dsa: link aggregation support Vladimir Oltean
@ 2020-10-27 14:29   ` Andrew Lunn
  2020-10-27 14:59   ` Tobias Waldekranz
  1 sibling, 0 replies; 43+ messages in thread
From: Andrew Lunn @ 2020-10-27 14:29 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: Tobias Waldekranz, vivien.didelot, f.fainelli, netdev

> > (mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
> > seems like all chips capable of doing EDSA are using that, except for
> > the Peridot.
> 
> I have no documentation whatsoever for mv88e6xxx, but just wondering,
> what is the benefit brought by EDSA here vs DSA? Does DSA have the
> same restriction when the ports are in a LAG?

Hi Vladimir

One advantage of EDSA is that it uses a well known Ether Type. It was
easy for me to add support to tcpdump to spot this Ether type, decode
the EDSA header, and pass the rest of the frame on for further
processing as normal.

With DSA, you cannot look at the packet and know it is DSA, and then
correctly decode it. So tcpdump just show the packet as undecodable.

Florian fixed this basic problem a while ago, since not being able to
decode packets is a problem for all tagger except EDSA. So now there
is extra meta information inserted into the pcap file, which gives
tcpdump the hint it needs to do the extra decoding of the tagger
header. But before that was added, it was much easier to debug EDSA vs
DSA because of tcpdump decoding.

	Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X
  2020-10-27 10:51 ` [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X Tobias Waldekranz
@ 2020-10-27 14:52   ` Marek Behun
  2020-10-27 14:54     ` Marek Behun
  0 siblings, 1 reply; 43+ messages in thread
From: Marek Behun @ 2020-10-27 14:52 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, olteanv, netdev

On Tue, 27 Oct 2020 11:51:14 +0100
Tobias Waldekranz <tobias@waldekranz.com> wrote:

> The policy is to use ethertyped DSA for all devices that are capable
> of doing so, which the Peridot is.

What is the benefit here?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X
  2020-10-27 14:52   ` Marek Behun
@ 2020-10-27 14:54     ` Marek Behun
  2020-10-27 14:58       ` Marek Behun
  0 siblings, 1 reply; 43+ messages in thread
From: Marek Behun @ 2020-10-27 14:54 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, olteanv, netdev

On Tue, 27 Oct 2020 15:52:13 +0100
Marek Behun <marek.behun@nic.cz> wrote:

> On Tue, 27 Oct 2020 11:51:14 +0100
> Tobias Waldekranz <tobias@waldekranz.com> wrote:
> 
> > The policy is to use ethertyped DSA for all devices that are capable
> > of doing so, which the Peridot is.
> 
> What is the benefit here?

Also, when you are changing something for 6390, please do the same
change for the non-industrial version of Peridot (6190, 6190X), for
6290 and 6191.

And since Topaz (6341 and 6141) are basically smaller Peridot's (with 6
ports instead of 11), such a change should also go there.

But again, what is the benefit here?

Marek

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X
  2020-10-27 14:54     ` Marek Behun
@ 2020-10-27 14:58       ` Marek Behun
  0 siblings, 0 replies; 43+ messages in thread
From: Marek Behun @ 2020-10-27 14:58 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, olteanv, netdev

On Tue, 27 Oct 2020 15:54:36 +0100
Marek Behun <marek.behun@nic.cz> wrote:

> But again, what is the benefit here?

OK, you need this for the LAG support, somehow those emails went to
another folder, sorry :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 12:27 ` [RFC PATCH 0/4] net: dsa: link aggregation support Vladimir Oltean
  2020-10-27 14:29   ` Andrew Lunn
@ 2020-10-27 14:59   ` Tobias Waldekranz
  1 sibling, 0 replies; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 14:59 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: andrew, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 14:27, Vladimir Oltean <olteanv@gmail.com> wrote:
>> The LAG driver ops all receive the LAG netdev as an argument when this
>> information is already available through the port's lag pointer. This
>> was done to match the way that the bridge netdev is passed to all VLAN
>> ops even though it is in the port's bridge_dev. Is there a reason for
>> this or should I just remove it from the LAG ops?
>
> Maybe because on "leave", the bridge/LAG net device pointer inside
> struct dsa_port is first set to NULL, then the DSA notifier is called?

Right, that makes sense. For LAGs I keep ds->lag set until the leave ops
have run. But perhaps I should change it to match the VLAN ops?

> Since ocelot/felix does not have this restriction, and supports
> individual port addressing even under a LAG, you can imagine I am not
> very happy to see the RX data path punishing everyone else that is not
> mv88e6xxx.

I understand that, for sure. Though to be clear, the only penalty in
terms of performance is an extra call to dsa_slave_check, which is just
a load and compare on skb->dev->netdev_ops.

>> (mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
>> seems like all chips capable of doing EDSA are using that, except for
>> the Peridot.
>
> I have no documentation whatsoever for mv88e6xxx, but just wondering,
> what is the benefit brought by EDSA here vs DSA? Does DSA have the
> same restriction when the ports are in a LAG?

The same restrictions apply I'm afraid. The only difference is that you
prepend a proper ethertype before the tag.

The idea (as far as I know) is that you can trap control traffic (TO_CPU
in DSA parlance) to the CPU and receive (E)DSA tagged to implement
things like STP and LLDP, but you receive the data traffic (FORWARD)
untagged or with an 802.1Q tag.

This means you can use standard VLAN accelerators on NICs to
remove/insert the 1Q tags. In a routing scenario this can bring a
significant speed-up as you skip two memcpys per packet to remove and
insert the tag.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
                   ` (5 preceding siblings ...)
  2020-10-27 14:00 ` Andrew Lunn
@ 2020-10-27 15:05 ` Marek Behun
  2020-10-27 15:23   ` Andrew Lunn
  2020-10-27 22:36 ` Andrew Lunn
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Marek Behun @ 2020-10-27 15:05 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, olteanv, netdev

When I first read about port trunking in the Peridot documentation, I
immediately thought that this could be used to transparently offload
that which is called Bonding in Linux...

Is this what you want to eventually do?

BTW, I thought about using port trunking to solve the multi-CPU DSA
issue as well. On Turris Omnia we have 2 switch ports connected to the
CPU. So I could trunk these 2 swtich ports, and on the other side
create a bonding interface from eth0 and eth1.

Andrew, what do you think about this? Is this something that can be
done? Or is it too complicated?

Marek

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 14:00 ` Andrew Lunn
@ 2020-10-27 15:09   ` Tobias Waldekranz
  0 siblings, 0 replies; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 15:09 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: vivien.didelot, f.fainelli, olteanv, netdev

On Tue, Oct 27, 2020 at 15:00, Andrew Lunn <andrew@lunn.ch> wrote:
>> (mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
>> seems like all chips capable of doing EDSA are using that, except for
>> the Peridot.
>
> Hi Tobias
>
> Marvell removed the ability to use EDSA, in the way we do in Linux
> DSA, on Peridot. One of the configuration bits is gone. So i had to
> use DSA.  It could be i just don't understand how to configure
> Peridot, and EDSA could be used, but i never figured it out.
>
> I will get back to your other questions.
>
>   Andrew

Hi Andrew,

Very interesting! Which bit is that?

Looking at the datasheets for Agate and Peridot side-by-side, the
sections named "Ether Type DSA Tag" seem to be identical.

Thanks,
Tobias

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 15:05 ` Marek Behun
@ 2020-10-27 15:23   ` Andrew Lunn
  2020-10-27 18:25     ` Tobias Waldekranz
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Lunn @ 2020-10-27 15:23 UTC (permalink / raw)
  To: Marek Behun
  Cc: Tobias Waldekranz, vivien.didelot, f.fainelli, olteanv, netdev

On Tue, Oct 27, 2020 at 04:05:30PM +0100, Marek Behun wrote:
> When I first read about port trunking in the Peridot documentation, I
> immediately thought that this could be used to transparently offload
> that which is called Bonding in Linux...
> 
> Is this what you want to eventually do?
> 
> BTW, I thought about using port trunking to solve the multi-CPU DSA
> issue as well. On Turris Omnia we have 2 switch ports connected to the
> CPU. So I could trunk these 2 swtich ports, and on the other side
> create a bonding interface from eth0 and eth1.
> 
> Andrew, what do you think about this? Is this something that can be
> done? Or is it too complicated?
 
Hi Marek

trunking is something i've looked at once, but never had time to work
on. There are three different use cases i thought of:

1) trunk user ports, with team/bonding controlling it
2) trunk DSA ports, i.e. the ports between switches in a D in DSA setup
3) trunk CPU ports.

What Tobias is implementing here is 1). This seems like a good first
step.

I'm not sure 3) is even possible. Or it might depend on the switch
generation. The 6352 for example, the CPU Dest field is a port
number. It does not appear to allow for a trunk. 6390 moved this
register, but as far as i know, it did not add trunk support.  It
might be possible to have multiple SoC interfaces sending frames to
the Switch using DSA tags, but i don't see a way to have the switch
send frames to the SoC using multiple ports.

     Andrew


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 15:23   ` Andrew Lunn
@ 2020-10-27 18:25     ` Tobias Waldekranz
  2020-10-27 18:33       ` Marek Behun
                         ` (2 more replies)
  0 siblings, 3 replies; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 18:25 UTC (permalink / raw)
  To: Andrew Lunn, Marek Behun; +Cc: vivien.didelot, f.fainelli, olteanv, netdev

On Tue, Oct 27, 2020 at 16:23, Andrew Lunn <andrew@lunn.ch> wrote:
> Hi Marek
>
> trunking is something i've looked at once, but never had time to work
> on. There are three different use cases i thought of:
>
> 1) trunk user ports, with team/bonding controlling it
> 2) trunk DSA ports, i.e. the ports between switches in a D in DSA setup
> 3) trunk CPU ports.
>
> What Tobias is implementing here is 1). This seems like a good first
> step.
>
> I'm not sure 3) is even possible. Or it might depend on the switch
> generation. The 6352 for example, the CPU Dest field is a port
> number. It does not appear to allow for a trunk. 6390 moved this
> register, but as far as i know, it did not add trunk support.  It
> might be possible to have multiple SoC interfaces sending frames to
> the Switch using DSA tags, but i don't see a way to have the switch
> send frames to the SoC using multiple ports.

I think that (2) and (3) are essentially the same problem, i.e. creating
LAGs out of DSA links, be they switch-to-switch or switch-to-cpu
connections. I think you are correct that the CPU port can not be a
LAG/trunk, but I believe that limitation only applies to TO_CPU packets.
If the CPU ports configured as a LAG, all FORWARDs, i.e. the bulk
traffic, would benefit from the load-balancing. Something like this:

.-----. TO_CPU, FORWARD .-----. TO_CPU, FORWARD .-----.
|     +-----------------+     +-----------------+     |
| CPU |                 | sw0 |                 | sw1 |
|     +-----------------+     +-----------------+     |
'-----'    FORWARD      '-+-+-'    FORWARD      '-+-+-'
                          | |                     | |
                       swp1 swp2               swp3 swp4

So the links selected as the CPU ports will see a marginally higher load
due to all TO_CPU being sent over it. But the hashing is not that great
on this hardware anyway (DA/SA only) so some imbalance is unavoidable.

In order for this to work on transmit, we need to add forward offloading
to the bridge so that we can, for example, send one FORWARD from the CPU
to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.

	Tobias

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 18:25     ` Tobias Waldekranz
@ 2020-10-27 18:33       ` Marek Behun
  2020-10-27 19:04         ` Vladimir Oltean
  2020-10-27 19:00       ` Vladimir Oltean
  2020-10-28 22:35       ` Marek Behun
  2 siblings, 1 reply; 43+ messages in thread
From: Marek Behun @ 2020-10-27 18:33 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Andrew Lunn, vivien.didelot, f.fainelli, olteanv, netdev

> In order for this to work on transmit, we need to add forward offloading
> to the bridge so that we can, for example, send one FORWARD from the CPU
> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.

Wouldn't this be solved if the CPU master interface was a bonding interface?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 18:25     ` Tobias Waldekranz
  2020-10-27 18:33       ` Marek Behun
@ 2020-10-27 19:00       ` Vladimir Oltean
  2020-10-27 19:37         ` Tobias Waldekranz
  2020-10-28 22:35       ` Marek Behun
  2 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-27 19:00 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Andrew Lunn, Marek Behun, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 07:25:16PM +0100, Tobias Waldekranz wrote:
> > 1) trunk user ports, with team/bonding controlling it
> > 2) trunk DSA ports, i.e. the ports between switches in a D in DSA setup
> > 3) trunk CPU ports.
[...]
> I think that (2) and (3) are essentially the same problem, i.e. creating
> LAGs out of DSA links, be they switch-to-switch or switch-to-cpu
> connections. I think you are correct that the CPU port can not be a
> LAG/trunk, but I believe that limitation only applies to TO_CPU packets.

Which would still be ok? They are called "slow protocol PDUs" for a reason.

> In order for this to work on transmit, we need to add forward offloading
> to the bridge so that we can, for example, send one FORWARD from the CPU
> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.

That surely sounds like an interesting (and tough to implement)
optimization to increase the throughput, but why would it be _needed_
for things to work? What's wrong with 4 FROM_CPU packets?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 18:33       ` Marek Behun
@ 2020-10-27 19:04         ` Vladimir Oltean
  2020-10-27 19:21           ` Tobias Waldekranz
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-27 19:04 UTC (permalink / raw)
  To: Marek Behun
  Cc: Tobias Waldekranz, Andrew Lunn, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 07:33:37PM +0100, Marek Behun wrote:
> > In order for this to work on transmit, we need to add forward offloading
> > to the bridge so that we can, for example, send one FORWARD from the CPU
> > to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.
> 
> Wouldn't this be solved if the CPU master interface was a bonding interface?

I don't see how you would do that. Would DSA keep returning -EPROBE_DEFER
until user space decides to set up a bond over the master interfaces?
How would you even describe that in device tree?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 19:04         ` Vladimir Oltean
@ 2020-10-27 19:21           ` Tobias Waldekranz
  0 siblings, 0 replies; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 19:21 UTC (permalink / raw)
  To: Vladimir Oltean, Marek Behun
  Cc: Andrew Lunn, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 21:04, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Oct 27, 2020 at 07:33:37PM +0100, Marek Behun wrote:
>> > In order for this to work on transmit, we need to add forward offloading
>> > to the bridge so that we can, for example, send one FORWARD from the CPU
>> > to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.
>> 
>> Wouldn't this be solved if the CPU master interface was a bonding interface?
>
> I don't see how you would do that. Would DSA keep returning -EPROBE_DEFER
> until user space decides to set up a bond over the master interfaces?
> How would you even describe that in device tree?

Yeah that would be very hard indeed. Since this is going to be
completely transparent to the user I think the best way is to just setup
the hardware to see the two CPU ports as a LAG whenever you find
e.g. "cpu0" and "cpu1", but have no representation of it as a separate
netdev.

DSA will have an rx_handler attached to both ports anyway, so it can
just run the same handler for both. On Tx it can just load-balance in
software like team does.


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 19:00       ` Vladimir Oltean
@ 2020-10-27 19:37         ` Tobias Waldekranz
  2020-10-27 20:02           ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 19:37 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Marek Behun, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 21:00, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Oct 27, 2020 at 07:25:16PM +0100, Tobias Waldekranz wrote:
>> > 1) trunk user ports, with team/bonding controlling it
>> > 2) trunk DSA ports, i.e. the ports between switches in a D in DSA setup
>> > 3) trunk CPU ports.
> [...]
>> I think that (2) and (3) are essentially the same problem, i.e. creating
>> LAGs out of DSA links, be they switch-to-switch or switch-to-cpu
>> connections. I think you are correct that the CPU port can not be a
>> LAG/trunk, but I believe that limitation only applies to TO_CPU packets.
>
> Which would still be ok? They are called "slow protocol PDUs" for a reason.

Oh yes, completely agree. That was the point I was trying to make :)

>> In order for this to work on transmit, we need to add forward offloading
>> to the bridge so that we can, for example, send one FORWARD from the CPU
>> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.
>
> That surely sounds like an interesting (and tough to implement)
> optimization to increase the throughput, but why would it be _needed_
> for things to work? What's wrong with 4 FROM_CPU packets?

We have internal patches that do this, and I can confirm that it is
tough :) I really would like to figure out a way to solve this, that
would also be acceptable upstream. I have some ideas, it is on my TODO.

In a single-chip system I agree that it is not needed, the CPU can do
the load-balancing in software. But in order to have the hardware do
load-balancing on a switch-to-switch LAG, you need to send a FORWARD.

FROM_CPUs would just follow whatever is in the device mapping table. You
essentially have the inverse of the TO_CPU problem, but on Tx FROM_CPU
would make up 100% of traffic.

Other than that there are some things that, while strictly speaking
possible to do without FORWARDs, become much easier to deal with:

- Multicast routing. This is one case where performance _really_ suffers
  from having to skb_clone() to each recipient.

- Bridging between virtual interfaces and DSA ports. Typical example is
  an L2 VPN tunnel or one end of a veth pair. On FROM_CPUs, the switch
  can not perform SA learning, which means that once you bridge traffic
  from the VPN out to a DSA port, the return traffic will be classified
  as unknown unicast by the switch and be flooded everywhere.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 19:37         ` Tobias Waldekranz
@ 2020-10-27 20:02           ` Vladimir Oltean
  2020-10-27 20:53             ` Tobias Waldekranz
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-27 20:02 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Andrew Lunn, Marek Behun, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 08:37:58PM +0100, Tobias Waldekranz wrote:
> >> In order for this to work on transmit, we need to add forward offloading
> >> to the bridge so that we can, for example, send one FORWARD from the CPU
> >> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.

[...]

> In a single-chip system I agree that it is not needed, the CPU can do
> the load-balancing in software. But in order to have the hardware do
> load-balancing on a switch-to-switch LAG, you need to send a FORWARD.
> 
> FROM_CPUs would just follow whatever is in the device mapping table. You
> essentially have the inverse of the TO_CPU problem, but on Tx FROM_CPU
> would make up 100% of traffic.

Woah, hold on, could you explain in more detail for non-expert people
like myself to understand.

So FROM_CPU frames (what tag_edsa.c uses now in xmit) can encode a
_single_ destination port in the frame header.

Whereas the FORWARD frames encode a _source_ port in the frame header.
You inject FORWARD frames from the CPU port, and you just let the L2
forwarding process select the adequate destination ports (or LAG, if
any ports are under one) _automatically_. The reason why you do this, is
because you want to take advantage of the switch's flooding abilities in
order to replicate the packet into 4 packets. So you will avoid cloning
that packet in the bridge in the first place.

But correct me if I'm wrong, sending a FORWARD frame from the CPU is a
slippery slope, since you're never sure that the switch will perform the
replication exactly as you intended to. The switch will replicate a
FORWARD frame by looking up the FDB, and we don't even attempt in DSA to
keep the FDB in sync between software and hardware. And that's why we
send FROM_CPU frames in tag_edsa.c and not FORWARD frames.

What you are really looking for is hardware where the destination field
for FROM_CPU packets is not a single port index, but a port mask.

Right?

Also, this problem is completely orthogonal to LAG? Where does LAG even
come into play here?

> Other than that there are some things that, while strictly speaking
> possible to do without FORWARDs, become much easier to deal with:
> 
> - Multicast routing. This is one case where performance _really_ suffers
>   from having to skb_clone() to each recipient.
> 
> - Bridging between virtual interfaces and DSA ports. Typical example is
>   an L2 VPN tunnel or one end of a veth pair. On FROM_CPUs, the switch
>   can not perform SA learning, which means that once you bridge traffic
>   from the VPN out to a DSA port, the return traffic will be classified
>   as unknown unicast by the switch and be flooded everywhere.

And how is this going to solve that problem? You mean that the switch
learns only from FORWARD, but not from FROM_CPU?

Why don't you attempt to solve this more generically somehow? Your
switch is not the only one that can't perform source address learning
for injected traffic, there are tons more, some are not even DSA. We
can't have everybody roll their own solution.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 20:02           ` Vladimir Oltean
@ 2020-10-27 20:53             ` Tobias Waldekranz
  2020-10-27 22:32               ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-27 20:53 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Marek Behun, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 22:02, Vladimir Oltean <olteanv@gmail.com> wrote:
> On Tue, Oct 27, 2020 at 08:37:58PM +0100, Tobias Waldekranz wrote:
>> >> In order for this to work on transmit, we need to add forward offloading
>> >> to the bridge so that we can, for example, send one FORWARD from the CPU
>> >> to send an ARP broadcast to swp1..4 instead of four FROM_CPUs.
>
> [...]
>
>> In a single-chip system I agree that it is not needed, the CPU can do
>> the load-balancing in software. But in order to have the hardware do
>> load-balancing on a switch-to-switch LAG, you need to send a FORWARD.
>> 
>> FROM_CPUs would just follow whatever is in the device mapping table. You
>> essentially have the inverse of the TO_CPU problem, but on Tx FROM_CPU
>> would make up 100% of traffic.
>
> Woah, hold on, could you explain in more detail for non-expert people
> like myself to understand.
>
> So FROM_CPU frames (what tag_edsa.c uses now in xmit) can encode a
> _single_ destination port in the frame header.

Correct.

> Whereas the FORWARD frames encode a _source_ port in the frame header.
> You inject FORWARD frames from the CPU port, and you just let the L2
> forwarding process select the adequate destination ports (or LAG, if
> any ports are under one) _automatically_. The reason why you do this, is
> because you want to take advantage of the switch's flooding abilities in
> order to replicate the packet into 4 packets. So you will avoid cloning
> that packet in the bridge in the first place.

Exactly so.

> But correct me if I'm wrong, sending a FORWARD frame from the CPU is a
> slippery slope, since you're never sure that the switch will perform the
> replication exactly as you intended to. The switch will replicate a
> FORWARD frame by looking up the FDB, and we don't even attempt in DSA to
> keep the FDB in sync between software and hardware. And that's why we
> send FROM_CPU frames in tag_edsa.c and not FORWARD frames.

I'm not sure if I agree that it's a slippery slope. The whole point of
the switchdev effort is to sync the switch with the bridge. We trust the
fabric to do all the steps you describe for _all_ other ports.

> What you are really looking for is hardware where the destination field
> for FROM_CPU packets is not a single port index, but a port mask.
>
> Right?

Sure, if that's available it's great. Chips from Marvell's Prestera line
can do this, and many others I'm sure. Alas, LinkStreet devices can not,
and I still want the best performance I can get i that case.

> Also, this problem is completely orthogonal to LAG? Where does LAG even
> come into play here?

It matters if you setup switch-to-switch LAGs. FROM_CPU packets encode
the final device/port, and switches will forward those packet according
to their device mapping tables, which selects a _single_ local port to
use to reach a remote device/port. So all FROM_CPU packets to a given
device/port will always travel through the same set of ports.

In the FORWARD case, you look up the destination in the FDB of each
device, find that it is located on the other side of a LAG, and the
hardware will perform load-balancing.

>> Other than that there are some things that, while strictly speaking
>> possible to do without FORWARDs, become much easier to deal with:
>> 
>> - Multicast routing. This is one case where performance _really_ suffers
>>   from having to skb_clone() to each recipient.
>> 
>> - Bridging between virtual interfaces and DSA ports. Typical example is
>>   an L2 VPN tunnel or one end of a veth pair. On FROM_CPUs, the switch
>>   can not perform SA learning, which means that once you bridge traffic
>>   from the VPN out to a DSA port, the return traffic will be classified
>>   as unknown unicast by the switch and be flooded everywhere.
>
> And how is this going to solve that problem? You mean that the switch
> learns only from FORWARD, but not from FROM_CPU?

Yes, so when you send the FORWARD the switch knows that the station is
located somewhere behind the CPU port. It does not know exactly where,
i.e. it has no knowledge of the VPN tunnel or anything. It just directs
it towards the CPU and the bridge's FDB will take care of the rest.

> Why don't you attempt to solve this more generically somehow? Your
> switch is not the only one that can't perform source address learning
> for injected traffic, there are tons more, some are not even DSA. We
> can't have everybody roll their own solution.

Who said anything about rolling my solution? I'm going for a generic
solution where a netdev can announce to the bridge it is being added to
that it can offload forwarding of packets for all ports belonging to the
same switchdev device. Most probably modeled after how the macvlan
offloading stuff is done.

In the case of mv88e6xxx that would kill two birds with one stone -
great! In other cases you might have to have the DSA subsystem listen to
new neighbors appearing on the bridge and sync those to hardware or
something. Hopefully someone working with that kind of hardware can
solve that problem.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 20:53             ` Tobias Waldekranz
@ 2020-10-27 22:32               ` Vladimir Oltean
  2020-10-28  0:27                 ` Tobias Waldekranz
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-27 22:32 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Andrew Lunn, Marek Behun, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 09:53:45PM +0100, Tobias Waldekranz wrote:
> So all FROM_CPU packets to a given device/port will always travel
> through the same set of ports.

Ah, this is the part that didn't click for me.
For the simple case where you have a multi-CPU port system:

                     Switch 0
                   +--------------------------------+
DSA master #1      |CPU port #1                     |
   +------+        +------+                 +-------+
   | eth0 | ---->  |      | -----\    ----- | sw0p0 | ------>
   +------+        +------+       \  /      +-------+
                   |               \/               |
 DSA master #2     |CPU port #2    /\               |
   +------+        +------+       /  \      +-------|
   | eth1 | ---->  |      | -----/    \-----| sw0p1 | ------>
   +------+        +------+                 +-------+
                   |                                |
                   +--------------------------------+

you can have Linux do load balancing of CPU ports on TX for many streams
being delivered to the same egress port (sw0p0).

But if you have a cascade:

                     Switch 0                                 Switch 1
                   +--------------------------------+       +--------------------------------+
DSA master #1      |CPU port #1         DSA link #1 |       |DSA link #1                     |
   +------+        +------+                 +-------+       +------+                 +-------+
   | eth0 | ---->  |      | -----\    ----- |       | ----> |      | -----\    ----- | sw1p0 | ---->
   +------+        +------+       \  /      +-------+       +------+       \  /      +-------+
                   |               \/               |       |               \/               |
 DSA master #2     |CPU port #2    /\   DSA link #2 |       |DSA link #2    /\               |
   +------+        +------+       /  \      +-------|       +------+       /  \      +-------|
   | eth1 | ---->  |      | -----/    \-----|       | ----> |      | -----/    \-----| sw1p1 | ---->
   +------+        +------+                 +-------+       +------+                 +-------+
                   |                                |       |                                |
                   +--------------------------------+       +--------------------------------+

then you have no good way to spread the same load (many streams all
delivered to the same egress port, sw1p0) between DSA link #1 and DSA
link #2. DSA link #1 will get congested, while DSA link #2 will remain
unused.

And this all happens because for FROM_CPU packets, the hardware is
configured in mv88e6xxx_devmap_setup to deliver all packets with a
non-local switch ID towards the same "routing" port, right?

Whereas for FORWARD frames, the destination port for non-local switch ID
will not be established based on mv88e6xxx_devmap_setup, but based on
FDB lookup of {DMAC, VID}. In the second case above, this is the only
way for your hardware that the FDB could select the LAG as the
destination based on the FDB. Then, the hash code would be determined
from the packet, and the appropriate egress port within the LAG would be
selected.

So, to answer my own question:
Q: What does using FORWARD frames to offload TX flooding from the bridge
   have to do with a LAG between 2 switches?
A: Nothing, they would just both need FORWARD frames to be used.

> > Why don't you attempt to solve this more generically somehow? Your
> > switch is not the only one that can't perform source address learning
> > for injected traffic, there are tons more, some are not even DSA. We
> > can't have everybody roll their own solution.
> 
> Who said anything about rolling my solution? I'm going for a generic
> solution where a netdev can announce to the bridge it is being added to
> that it can offload forwarding of packets for all ports belonging to the
> same switchdev device. Most probably modeled after how the macvlan
> offloading stuff is done.

The fact that I have no idea how the macvlan offloading is done does not
really help me, but here, the fact that I understood nothing doesn't
appear to stem from that.

"a netdev can announce to the bridge it is being added to that it can
offload forwarding of packets for all ports belonging to the same
switchdev device"

What do you mean? skb->offload_fwd_mark? Or are you still talking about
its TX-side equivalent here, which is what we've been talking about in
these past few mails? If so, I'm confused by you calling it "offload
forwarding of packets", I was expecting a description more in the lines
of "offload flooding of packets coming from host" or something like
that.

> In the case of mv88e6xxx that would kill two birds with one stone -
> great! In other cases you might have to have the DSA subsystem listen to
> new neighbors appearing on the bridge and sync those to hardware or
> something. Hopefully someone working with that kind of hardware can
> solve that problem.

If by "neighbors" you mean that you bridge a DSA swp0 with an e1000
eth0, then that is not going to be enough. The CPU port of swp0 will
need to learn not eth0's MAC address, but in fact the MAC address of all
stations that might be connected to eth0. There might even be a network
switch connected to eth0, not just a directly connected link partner.
So there are potentially many MAC addresses to be learnt, and all are
unknown off-hand.
I admit I haven't actually looked at implementing this, but I would
expect that what needs to be done is that the local (master) FDB of the
bridge (which would get populated on the RX side of the "foreign
interface" through software learning) would need to get offloaded in its
entirety towards all switchdev ports, via a new switchdev "host FDB"
object or something of that kind (where a "host FDB" entry offloaded on
a port would mean "see this {DMAC, VID} pair? send it to the CPU").

With your FORWARD frames life-hack you can eschew all of that, good for
you. I was just speculatively hoping you might be interested in tackling
the hard way.

Anyway, this discussion has started mixing up basic stuff (like
resolving your source address learning issue on the CPU port, when
bridged with a foreign interface) with advanced / optimization stuff
(LAG, offload flooding from host), the only commonality appearing to be
a need for FORWARD frames. Can you even believe we are still commenting
on a series about something as mundane as link aggregation on DSA user
ports? At least I can't. I'll go off and start reviewing your patches,
before we manage to lose everybody along the way.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
                   ` (6 preceding siblings ...)
  2020-10-27 15:05 ` Marek Behun
@ 2020-10-27 22:36 ` Andrew Lunn
  2020-10-28  0:45   ` Tobias Waldekranz
  2020-11-11  4:28 ` Florian Fainelli
  2020-11-19 10:51 ` Vladimir Oltean
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Lunn @ 2020-10-27 22:36 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: vivien.didelot, f.fainelli, olteanv, netdev

Hi Tobias

> All LAG configuration is cached in `struct dsa_lag`s. I realize that
> the standard M.O. of DSA is to read back information from hardware
> when required. With LAGs this becomes very tricky though. For example,
> the change of a link state on one switch will require re-balancing of
> LAG hash buckets on another one, which in turn depends on the total
> number of active links in the LAG. Do you agree that this is
> motivated?

As you say, DSA tries to be stateless and not allocate any
memory. That keeps things simple. If you cannot allocate the needed
memory, you need to ensure you leave the system untouched. And that
needs to happen all the way up the stack when you have nested devices
etc. That is why many APIs have a prepare phase and then a commit
phase. The prepare phase allocates all the needed memory, can fail,
but does not otherwise touch the running system. The commit phase
cannot fail, since it has everything it needs.

If you are dynamically allocating dsa_lag structures, at run time, you
need to think about this. But the number of LAGs is limited by the
number of ports. So i would consider just allocating the worst case
number at probe, and KISS for runtime.

> At least on mv88e6xxx, the exact source port is not available when
> packets are received on the CPU. The way I see it, there are two ways
> around that problem:

Does that break team/bonding? Do any of the algorithms send packets on
specific ports to make sure they are alive? I've not studied how
team/bonding works, but it must have a way to determine if a link has
failed and it needs to fallover.

> (mv88e6xxx) The cross-chip PVT changes required to allow a LAG to
> communicate with the other ports do not feel quite right, but I'm
> unsure about what the proper way of doing it would be. Any ideas?

Vivien implemented all that. I hope he can help you, i've no real idea
how that all works.

> (mv88e6xxx) Marvell has historically used the idiosyncratic term
> "trunk" to refer to link aggregates. Somewhere around the Peridot they
> have switched and are now referring to the same registers/tables using
> the term "LAG". In this series I've stuck to using LAG for all generic
> stuff, and only used trunk for driver-internal functions. Do we want
> to rename everything to use the LAG nomenclature?

Where possible, i would keep to the datasheet terminology. So any 6352
specific function should use 6352 terminology. Any 6390 specific
function should use 6390 terminology. For code which supports a range
of generations, we have used the terminology from the first device
which had the feature. In practice, this probably means trunk is going
to be used most of the time, and LAG in just 6390 code. Often, the
glue code in chip.c uses linux stack terminology.

   Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 22:32               ` Vladimir Oltean
@ 2020-10-28  0:27                 ` Tobias Waldekranz
  0 siblings, 0 replies; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-28  0:27 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Marek Behun, vivien.didelot, f.fainelli, netdev

On Wed, Oct 28, 2020 at 00:32, Vladimir Oltean <olteanv@gmail.com> wrote:
> And this all happens because for FROM_CPU packets, the hardware is
> configured in mv88e6xxx_devmap_setup to deliver all packets with a
> non-local switch ID towards the same "routing" port, right?

Precisely.

> Whereas for FORWARD frames, the destination port for non-local switch ID
> will not be established based on mv88e6xxx_devmap_setup, but based on
> FDB lookup of {DMAC, VID}. In the second case above, this is the only
> way for your hardware that the FDB could select the LAG as the
> destination based on the FDB. Then, the hash code would be determined
> from the packet, and the appropriate egress port within the LAG would be
> selected.

That's it!

> What do you mean? skb->offload_fwd_mark? Or are you still talking about
> its TX-side equivalent here, which is what we've been talking about in
> these past few mails? If so, I'm confused by you calling it "offload
> forwarding of packets", I was expecting a description more in the lines
> of "offload flooding of packets coming from host" or something like
> that.

I'm still talking about the TX-equivalent. I chose my words carefully
because it is not _only_ for flooding, although that is the main
benefit.

If I've understood the basics of macvlan offloading correctly, it uses
the ndo_dfwd_add/del_station ops to ask the lower device if it can
offload transmissions on behalf of the macvlan device. If the lower is
capable, the macvlan code will use dev_queue_xmit_accel to specify that
the skb is being forwarded from a "subordinate" device. For a bridge,
that would mean "forward this packet to the relevant ports, given the
current configuration".

This is just one possible approach though.

>> In the case of mv88e6xxx that would kill two birds with one stone -
>> great! In other cases you might have to have the DSA subsystem listen to
>> new neighbors appearing on the bridge and sync those to hardware or
>> something. Hopefully someone working with that kind of hardware can
>> solve that problem.
>
> If by "neighbors" you mean that you bridge a DSA swp0 with an e1000
> eth0, then that is not going to be enough. The CPU port of swp0 will
> need to learn not eth0's MAC address, but in fact the MAC address of all
> stations that might be connected to eth0. There might even be a network
> switch connected to eth0, not just a directly connected link partner.
> So there are potentially many MAC addresses to be learnt, and all are
> unknown off-hand.

Yep, hence the "technically possible, but hard" remark I made earlier :)

> I admit I haven't actually looked at implementing this, but I would
> expect that what needs to be done is that the local (master) FDB of the
> bridge (which would get populated on the RX side of the "foreign
> interface" through software learning) would need to get offloaded in its
> entirety towards all switchdev ports, via a new switchdev "host FDB"
> object or something of that kind (where a "host FDB" entry offloaded on
> a port would mean "see this {DMAC, VID} pair? send it to the CPU").
>
> With your FORWARD frames life-hack you can eschew all of that, good for
> you. I was just speculatively hoping you might be interested in tackling
> the hard way.

Being able to set host FDB entries like we can for host MDB is useful
for other things as well, so I might very well be willing to do it.

> Anyway, this discussion has started mixing up basic stuff (like
> resolving your source address learning issue on the CPU port, when
> bridged with a foreign interface) with advanced / optimization stuff
> (LAG, offload flooding from host), the only commonality appearing to be
> a need for FORWARD frames. Can you even believe we are still commenting
> on a series about something as mundane as link aggregation on DSA user
> ports? At least I can't. I'll go off and start reviewing your patches,
> before we manage to lose everybody along the way.

Agreed, we went deep down the rabbit hole! This might not have been the
most natural place for these discussions, but it was fun nonetheless :)

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 22:36 ` Andrew Lunn
@ 2020-10-28  0:45   ` Tobias Waldekranz
  2020-10-28  1:03     ` Andrew Lunn
  0 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-28  0:45 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: vivien.didelot, f.fainelli, olteanv, netdev

On Tue, Oct 27, 2020 at 23:36, Andrew Lunn <andrew@lunn.ch> wrote:
> If you are dynamically allocating dsa_lag structures, at run time, you
> need to think about this. But the number of LAGs is limited by the
> number of ports. So i would consider just allocating the worst case
> number at probe, and KISS for runtime.

Oh OK, yeah that just makes stuff easier so that's absolutely fine. I
got the sense that the overall movement within DSA was in the opposite
direction. E.g. didn't the dst use to have an array of ds pointers?
Whereas now you iterate through dst->ports to find them?

>> At least on mv88e6xxx, the exact source port is not available when
>> packets are received on the CPU. The way I see it, there are two ways
>> around that problem:
>
> Does that break team/bonding? Do any of the algorithms send packets on
> specific ports to make sure they are alive? I've not studied how
> team/bonding works, but it must have a way to determine if a link has
> failed and it needs to fallover.

This limitation only applies to FORWARD packets. TO_CPU packets will
still contain device/port. So you have to make sure that the control
packets are trapped and not forwarded to the CPU (e.g. by setting the
Resvd2CPU bits in Global2).

> Where possible, i would keep to the datasheet terminology. So any 6352
> specific function should use 6352 terminology. Any 6390 specific
> function should use 6390 terminology. For code which supports a range
> of generations, we have used the terminology from the first device
> which had the feature. In practice, this probably means trunk is going
> to be used most of the time, and LAG in just 6390 code. Often, the
> glue code in chip.c uses linux stack terminology.

Fair enough, trunking it is then. I don't expect we'll have anything
mv88e6xxx specific using the LAG term in that case. From what I can
tell, the trunk settings have not changed since at least 6095.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 2/4] net: dsa: link aggregation support
  2020-10-27 10:51 ` [RFC PATCH 2/4] net: dsa: link aggregation support Tobias Waldekranz
@ 2020-10-28  0:58   ` Vladimir Oltean
  2020-10-28 14:03     ` Tobias Waldekranz
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-28  0:58 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 11:51:15AM +0100, Tobias Waldekranz wrote:
> Monitor the following events and notify the driver when:
> 
> - A DSA port joins/leaves a LAG.
> - A LAG, made up of DSA ports, joins/leaves a bridge.
> - A DSA port in a LAG is enabled/disabled (enabled meaning
>   "distributing" in 802.3ad LACP terms).
> 
> Each LAG interface to which a DSA port is attached is represented by a
> `struct dsa_lag` which is globally reachable from the switch tree and

When you use dsa_broadcast, it is reachable from _all_ switch trees, not
from "the" switch tree. This was added to support "islands" of
inter-compatible DSA switches separated by other DSA switches with
incompatible taggers. Not sure if it was a voluntary decision to use
that as opposed to plain dsa_port_notify. Not a problem either way.

> from each associated port.
> 
> When a LAG joins a bridge, the DSA subsystem will treat that as each
> individual port joining the bridge. The driver may look at the port's
> LAG pointer to see if it is associated with any LAG, if that is
> required. This is analogue to how switchdev events are replicated out
> to all lower devices when reaching e.g. a LAG.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---
>  include/net/dsa.h  |  68 +++++++++++++++++++++
>  net/dsa/dsa2.c     |   3 +
>  net/dsa/dsa_priv.h |  16 +++++
>  net/dsa/port.c     | 146 +++++++++++++++++++++++++++++++++++++++++++++
>  net/dsa/slave.c    |  53 ++++++++++++++--
>  net/dsa/switch.c   |  64 ++++++++++++++++++++
>  6 files changed, 346 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index 35429a140dfa..58d73eafe891 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -145,6 +145,9 @@ struct dsa_switch_tree {
>  	/* List of switch ports */
>  	struct list_head ports;
>  
> +	/* List of configured LAGs */
> +	struct list_head lags;
> +
>  	/* List of DSA links composing the routing table */
>  	struct list_head rtable;
>  };
> @@ -178,6 +181,48 @@ struct dsa_mall_tc_entry {
>  	};
>  };
>  
> +struct dsa_lag {
> +	struct net_device *dev;
> +	int id;
> +
> +	struct list_head ports;
> +
> +	/* For multichip systems, we must ensure that each hash bucket
> +	 * is only enabled on a single egress port throughout the
> +	 * whole tree.

Or else?
I don't really understand this statement.

> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -193,6 +193,152 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
>  	dsa_port_set_state_now(dp, BR_STATE_FORWARDING);
>  }
>  
> +static struct dsa_lag *dsa_lag_get(struct dsa_switch_tree *dst,
> +				   struct net_device *dev)
> +{
> +	struct dsa_lag *lag;
> +	unsigned long busy = 0;

Reverse Christmas notation please?

> +	int id;
> +
> +	list_for_each_entry(lag, &dst->lags, list) {
> +		set_bit(lag->id, &busy);
> +
> +		if (lag->dev == dev) {
> +			kref_get(&lag->refcount);
> +			return lag;
> +		}
> +	}
> +
> +	id = find_first_zero_bit(&busy, BITS_PER_LONG);
> +	if (id >= BITS_PER_LONG)
> +		return ERR_PTR(-ENOSPC);
> +
> +	lag = kzalloc(sizeof(*lag), GFP_KERNEL);
> +	if (!lag)
> +		return ERR_PTR(-ENOMEM);
> +
> +	kref_init(&lag->refcount);
> +	lag->id = id;
> +	lag->dev = dev;
> +	INIT_LIST_HEAD(&lag->ports);
> +	INIT_LIST_HEAD(&lag->tx_ports);
> +
> +	INIT_LIST_HEAD(&lag->list);
> +	list_add_tail_rcu(&lag->list, &dst->lags);
> +	return lag;
> +}
> +
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 3bc5ca40c9fb..e5e4f3d096c0 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -334,7 +334,8 @@ static int dsa_slave_vlan_add(struct net_device *dev,
>  	struct switchdev_obj_port_vlan vlan;
>  	int vid, err;
>  
> -	if (obj->orig_dev != dev)
> +	if (!(obj->orig_dev == dev ||
> +	      (dp->lag && obj->orig_dev == dp->lag->dev)))

A small comment here maybe?

>  		return -EOPNOTSUPP;
>  
>  	if (dsa_port_skip_vlan_configuration(dp))
> @@ -421,7 +422,8 @@ static int dsa_slave_vlan_del(struct net_device *dev,
>  	struct switchdev_obj_port_vlan *vlan;
>  	int vid, err;
>  
> -	if (obj->orig_dev != dev)
> +	if (!(obj->orig_dev == dev ||
> +	      (dp->lag && obj->orig_dev == dp->lag->dev)))
>  		return -EOPNOTSUPP;
>  
>  	if (dsa_port_skip_vlan_configuration(dp))

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-28  0:45   ` Tobias Waldekranz
@ 2020-10-28  1:03     ` Andrew Lunn
  0 siblings, 0 replies; 43+ messages in thread
From: Andrew Lunn @ 2020-10-28  1:03 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: vivien.didelot, f.fainelli, olteanv, netdev

On Wed, Oct 28, 2020 at 01:45:11AM +0100, Tobias Waldekranz wrote:
> On Tue, Oct 27, 2020 at 23:36, Andrew Lunn <andrew@lunn.ch> wrote:
> > If you are dynamically allocating dsa_lag structures, at run time, you
> > need to think about this. But the number of LAGs is limited by the
> > number of ports. So i would consider just allocating the worst case
> > number at probe, and KISS for runtime.
> 
> Oh OK, yeah that just makes stuff easier so that's absolutely fine. I
> got the sense that the overall movement within DSA was in the opposite
> direction. E.g. didn't the dst use to have an array of ds pointers?
> Whereas now you iterate through dst->ports to find them?

Yes, but they are all allocated at probe time. It saved a bit of heap
for adding some code.

   Andrew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-27 10:51 ` [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices Tobias Waldekranz
@ 2020-10-28 12:05   ` Vladimir Oltean
  2020-10-28 15:28     ` Tobias Waldekranz
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-28 12:05 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 11:51:17AM +0100, Tobias Waldekranz wrote:
> Packets ingressing on a LAG that egress on the CPU port, which are not
> classified as management, will have a FORWARD tag that does not
> contain the normal source device/port tuple. Instead the trunk bit
> will be set, and the port field holds the LAG id.
> 
> Since the exact source port information is not available in the tag,
> frames are injected directly on the LAG interface and thus do never
> pass through any DSA port interface on ingress.
> 
> Management frames (TO_CPU) are not affected and will pass through the
> DSA port interface as usual.
> 
> Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com>
> ---
>  net/dsa/dsa.c      | 23 +++++++++++++----------
>  net/dsa/tag_edsa.c | 12 +++++++++++-
>  2 files changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 2131bf2b3a67..b84e5f0be049 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -220,7 +220,6 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  	}
>  
>  	skb = nskb;
> -	p = netdev_priv(skb->dev);
>  	skb_push(skb, ETH_HLEN);
>  	skb->pkt_type = PACKET_HOST;
>  	skb->protocol = eth_type_trans(skb, skb->dev);
> @@ -234,17 +233,21 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>  		skb = nskb;
>  	}
>  
> -	s = this_cpu_ptr(p->stats64);
> -	u64_stats_update_begin(&s->syncp);
> -	s->rx_packets++;
> -	s->rx_bytes += skb->len;
> -	u64_stats_update_end(&s->syncp);
> +	if (dsa_slave_dev_check(skb->dev)) {
> +		p = netdev_priv(skb->dev);
> +		s = this_cpu_ptr(p->stats64);
> +		u64_stats_update_begin(&s->syncp);
> +		s->rx_packets++;
> +		s->rx_bytes += skb->len;
> +		u64_stats_update_end(&s->syncp);
>  
> -	if (dsa_skb_defer_rx_timestamp(p, skb))
> -		return 0;
> -
> -	gro_cells_receive(&p->gcells, skb);
> +		if (dsa_skb_defer_rx_timestamp(p, skb))
> +			return 0;
>  
> +		gro_cells_receive(&p->gcells, skb);
> +	} else {
> +		netif_rx(skb);
> +	}
>  	return 0;
>  }
>  

On one hand, I feel pretty yucky about this change.
On the other hand, I wonder if it might be useful under some conditions
for drivers with DSA_TAG_PROTO_NONE? For example, once the user bridges
all slave interfaces, then that bridge will start being able to send and
receive traffic, despite none of the individual switch ports being able
to do that. Then, you could even go off and bridge a "foreign" interface,
and that would again work properly. That use case is not supported today,
but is very useful.

Thoughts?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 2/4] net: dsa: link aggregation support
  2020-10-28  0:58   ` Vladimir Oltean
@ 2020-10-28 14:03     ` Tobias Waldekranz
  0 siblings, 0 replies; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-28 14:03 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: andrew, vivien.didelot, f.fainelli, netdev

On Wed Oct 28, 2020 at 3:58 AM CET, Vladimir Oltean wrote:
> When you use dsa_broadcast, it is reachable from _all_ switch trees, not
> from "the" switch tree. This was added to support "islands" of
> inter-compatible DSA switches separated by other DSA switches with
> incompatible taggers. Not sure if it was a voluntary decision to use
> that as opposed to plain dsa_port_notify. Not a problem either way.

You're right, I want dsa_port_notify. I will change it and also remove
the tree_index from the notifier info struct.

> > +	/* For multichip systems, we must ensure that each hash bucket
> > +	 * is only enabled on a single egress port throughout the
> > +	 * whole tree.
>
> Or else?
> I don't really understand this statement.

Or else we will send the same packet through multiple ports. I.e. if
we have swp0..2 in a LAG with bucket config like this:

Bucket#  swp0  swp1  swp2
      0     Y     n     n
      1     Y     n     n
      2     Y     n     n
      3     Y     Y     n
      4     n     Y     n
      5     n     Y     n
      6     n     n     Y
      7     n     n     Y

Packets that hash to bucket 3 would be sent out through both swp0 and
swp1, which the receiver would interpret as two distinct packets with
the same contents.

I will reword it to make it more clear.

> > +	struct dsa_lag *lag;
> > +	unsigned long busy = 0;
>
> Reverse Christmas notation please?

I have no excuses. :)

> > -	if (obj->orig_dev != dev)
> > +	if (!(obj->orig_dev == dev ||
> > +	      (dp->lag && obj->orig_dev == dp->lag->dev)))
>
> A small comment here maybe?

Yep, will do.

Thanks,
Tobias

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-28 12:05   ` Vladimir Oltean
@ 2020-10-28 15:28     ` Tobias Waldekranz
  2020-10-28 18:18       ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-28 15:28 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: andrew, vivien.didelot, f.fainelli, netdev

On Wed Oct 28, 2020 at 3:05 PM CET, Vladimir Oltean wrote:
> On one hand, I feel pretty yucky about this change.
> On the other hand, I wonder if it might be useful under some conditions
> for drivers with DSA_TAG_PROTO_NONE? For example, once the user bridges
> all slave interfaces, then that bridge will start being able to send and
> receive traffic, despite none of the individual switch ports being able
> to do that. Then, you could even go off and bridge a "foreign"
> interface,
> and that would again work properly. That use case is not supported
> today,
> but is very useful.
>
> Thoughts?

In a scenario like the one you describe, are you saying that you would
set skb->dev to the bridge's netdev in the rcv op?

On ingress I think that would work. On egress I guess you would be
getting duplicates for all multi-destination packets as there is no
way for the none-tagger to limit it, right? (Unless you have the
awesome tx-offloading we talked about yesterday of course :))

I think bridging to "foreign" interfaces still won't work, because on
ingress the packet would never be caught by the bridge's rx
handler. In order for something to be received by br_input.c, it has
to pass through an interface that is attached to it, no?  Everything
above the bridge (like VLAN interfaces) should still work though.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-28 15:28     ` Tobias Waldekranz
@ 2020-10-28 18:18       ` Vladimir Oltean
  2020-10-28 22:31         ` Tobias Waldekranz
  2020-11-01 11:31         ` Ido Schimmel
  0 siblings, 2 replies; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-28 18:18 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: andrew, vivien.didelot, f.fainelli, netdev, Ido Schimmel

Adding Ido here, he has some more experience with the do's and dont's
here, and therefore might have one or two ideas to share.

On Wed, Oct 28, 2020 at 04:28:23PM +0100, Tobias Waldekranz wrote:
> On Wed Oct 28, 2020 at 3:05 PM CET, Vladimir Oltean wrote:
> > On one hand, I feel pretty yucky about this change.
> > On the other hand, I wonder if it might be useful under some conditions
> > for drivers with DSA_TAG_PROTO_NONE? For example, once the user bridges
> > all slave interfaces, then that bridge will start being able to send and
> > receive traffic, despite none of the individual switch ports being able
> > to do that. Then, you could even go off and bridge a "foreign"
> > interface,
> > and that would again work properly. That use case is not supported
> > today,
> > but is very useful.
> >
> > Thoughts?
> 
> In a scenario like the one you describe, are you saying that you would
> set skb->dev to the bridge's netdev in the rcv op?
> 
> On ingress I think that would work. On egress I guess you would be
> getting duplicates for all multi-destination packets as there is no
> way for the none-tagger to limit it, right? (Unless you have the
> awesome tx-offloading we talked about yesterday of course :))
> 
> I think bridging to "foreign" interfaces still won't work, because on
> ingress the packet would never be caught by the bridge's rx
> handler. In order for something to be received by br_input.c, it has
> to pass through an interface that is attached to it, no?  Everything
> above the bridge (like VLAN interfaces) should still work though.

Yes, I expect that the bridge input would need to have one more entry
path into it than just br_handle_frame.

I'm a bit confused and undecided right now, so let's look at it from a
different perspective. Let's imagine a switchdev driver (DSA or not)
which is able to offload IP forwarding. There are some interfaces that
are bridged and one that is standalone. The setup looks as below.

 IP interfaces
                +---------------------------------------------------------+
                |                           br0                           |
                +---------------------------------------------------------+

 +------------+ +------------+ +------------+ +------------+ +------------+
 |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    eth0    |
 +------------+ +------------+ +------------+ +------------+ +------------+

 Hardware interfaces

 +------------+ +------------+ +------------+ +------------+ +------------+
 | DSA port 0 | | DSA port 1 | | DSA port 2 | | DSA port 3 | |   e1000    |
 +------------+ +------------+ +------------+ +------------+ +------------+

Let's say you receive a packet on the standalone swp0, and you need to
perform IP routing towards the bridged domain br0. Some switchdev/DSA
ports are bridged and some aren't.

The switchdev/DSA switch will attempt to do the IP routing step first,
and it _can_ do that because it is aware of the br0 interface, so it
will decrement the TTL and replace the L2 header.

At this stage we have a modified IP packet, which corresponds with what
should be injected into the hardware's view of the br0 interface. The
packet is still in the switchdev/DSA hardware data path.

But then, the switchdev/DSA hardware will look up the FDB in the name of
br0, in an attempt of finding the destination port for the packet. But
the packet should be delivered to a station connected to eth0 (e1000,
foreign interface). So that's part of the exception path, the packet
should be delivered to the CPU.

But the packet was already modified by the hardware data path (IP
forwarding has already taken place)! So how should the DSA/switchdev
hardware deliver the packet to the CPU? It has 2 options:

(a) unwind the entire packet modification, cancel the IP forwarding and
    deliver the unmodified packet to the CPU on behalf of swp0, the
    ingress port. Then let software IP forwarding plus software bridging
    deal with it, so that it can reach the e1000.
(b) deliver the packet to the CPU in the middle of the hardware
    forwarding data path, where the exception/miss occurred, aka deliver
    it on behalf of br0. Modified by IP forwarding. This is where we'd
    have to manually inject skb->dev into br0 somehow.

Maybe this sounds a bit crazy, considering that we don't have IP
forwarding hardware with DSA today, and I am not exactly sure how other
switchdev drivers deal with this exception path today. But nonetheless,
it's almost impossible for DSA switches with IP forwarding abilities to
never come up some day, so we ought to have our mind set about how the
RX data path should like, and whether injecting directly into an upper
is icky or a fact of life.

Things get even more interesting when this is a cascaded DSA setup, and
the bridging and routing are cross-chip. There, the FIB/FDB of 2 there
isn't really any working around the problem that the packet might need
to be delivered to the CPU somewhere in the middle of the data path, and
it would need to be injected into the RX path of an upper interface in
that case.

What do you think?

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-28 18:18       ` Vladimir Oltean
@ 2020-10-28 22:31         ` Tobias Waldekranz
  2020-10-28 23:08           ` Vladimir Oltean
  2020-11-01 11:31         ` Ido Schimmel
  1 sibling, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-28 22:31 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: andrew, vivien.didelot, f.fainelli, netdev, Ido Schimmel

On Wed Oct 28, 2020 at 9:18 PM CET, Vladimir Oltean wrote:
> Let's say you receive a packet on the standalone swp0, and you need to
> perform IP routing towards the bridged domain br0. Some switchdev/DSA
> ports are bridged and some aren't.
>
> The switchdev/DSA switch will attempt to do the IP routing step first,
> and it _can_ do that because it is aware of the br0 interface, so it
> will decrement the TTL and replace the L2 header.
>
> At this stage we have a modified IP packet, which corresponds with what
> should be injected into the hardware's view of the br0 interface. The
> packet is still in the switchdev/DSA hardware data path.
>
> But then, the switchdev/DSA hardware will look up the FDB in the name of
> br0, in an attempt of finding the destination port for the packet. But
> the packet should be delivered to a station connected to eth0 (e1000,
> foreign interface). So that's part of the exception path, the packet
> should be delivered to the CPU.
>
> But the packet was already modified by the hardware data path (IP
> forwarding has already taken place)! So how should the DSA/switchdev
> hardware deliver the packet to the CPU? It has 2 options:
>
> (a) unwind the entire packet modification, cancel the IP forwarding and
> deliver the unmodified packet to the CPU on behalf of swp0, the
> ingress port. Then let software IP forwarding plus software bridging
> deal with it, so that it can reach the e1000.
> (b) deliver the packet to the CPU in the middle of the hardware
> forwarding data path, where the exception/miss occurred, aka deliver
> it on behalf of br0. Modified by IP forwarding. This is where we'd
> have to manually inject skb->dev into br0 somehow.

The thing is, unlike L2 where the hardware will add new neighbors to
its FDB autonomously, every entry in the hardware FIB is under the
strict control of the CPU. So I think you can avoid much of this
headache simply by determining if a given L3 nexthop/neighbor is
"foreign" to the switch or not, and then just skip offloading for
those entries.

You miss out on the hardware acceleration of replacing the L2 header
of course. But my guess would be that once you have payed the tax of
receiving the buffer via the NIC driver, allocated an skb, and called
netif_rx() etc. the routing operation will be a rounding error. At
least on smaller devices where the FIB is typically quite small.

> Maybe this sounds a bit crazy, considering that we don't have IP
> forwarding hardware with DSA today, and I am not exactly sure how other
> switchdev drivers deal with this exception path today. But nonetheless,
> it's almost impossible for DSA switches with IP forwarding abilities to
> never come up some day, so we ought to have our mind set about how the
> RX data path should like, and whether injecting directly into an upper
> is icky or a fact of life.

Not crazy at all. In fact the Amethyst (6393X), for which there is a
patchset available on netdev, is capable of doing this (the hardware
is - the posted patches do not implement it).

> Things get even more interesting when this is a cascaded DSA setup, and
> the bridging and routing are cross-chip. There, the FIB/FDB of 2 there
> isn't really any working around the problem that the packet might need
> to be delivered to the CPU somewhere in the middle of the data path, and
> it would need to be injected into the RX path of an upper interface in
> that case.
>
> What do you think?


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 18:25     ` Tobias Waldekranz
  2020-10-27 18:33       ` Marek Behun
  2020-10-27 19:00       ` Vladimir Oltean
@ 2020-10-28 22:35       ` Marek Behun
  2 siblings, 0 replies; 43+ messages in thread
From: Marek Behun @ 2020-10-28 22:35 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: Andrew Lunn, vivien.didelot, f.fainelli, olteanv, netdev

On Tue, 27 Oct 2020 19:25:16 +0100
Tobias Waldekranz <tobias@waldekranz.com> wrote:

> .-----. TO_CPU, FORWARD .-----. TO_CPU, FORWARD .-----.
> |     +-----------------+     +-----------------+     |
> | CPU |                 | sw0 |                 | sw1 |
> |     +-----------------+     +-----------------+     |
> '-----'    FORWARD      '-+-+-'    FORWARD      '-+-+-'
>                           | |                     | |
>                        swp1 swp2               swp3 swp4
> 
> So the links selected as the CPU ports will see a marginally higher load
> due to all TO_CPU being sent over it. But the hashing is not that great
> on this hardware anyway (DA/SA only) so some imbalance is unavoidable.

The hashing is horrible :( On Turris Omnia we have 5 user ports and 2
CPU ports, and I suspect that for most of our users there is at most
one peer MAC address on the other side of an user port. So if such a
user has 5 devices connected to each switch port, there are 5 pairs of
(DA,SA), so 2^5 = 32 different assignments of (DA,SA) pairs to CPU
ports.

With probability 2/32 = 6.25% traffic from all 5 user ports would go via
one port,
with probability 10/32 = 31.25% traffic from 4 user ports would go via
one port.

That is not good balancing :)

Marek

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-28 22:31         ` Tobias Waldekranz
@ 2020-10-28 23:08           ` Vladimir Oltean
  2020-10-29  7:47             ` Tobias Waldekranz
  0 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-28 23:08 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: andrew, vivien.didelot, f.fainelli, netdev, Ido Schimmel

On Wed, Oct 28, 2020 at 11:31:58PM +0100, Tobias Waldekranz wrote:
> The thing is, unlike L2 where the hardware will add new neighbors to
> its FDB autonomously, every entry in the hardware FIB is under the
> strict control of the CPU. So I think you can avoid much of this
> headache simply by determining if a given L3 nexthop/neighbor is
> "foreign" to the switch or not, and then just skip offloading for
> those entries.
> 
> You miss out on the hardware acceleration of replacing the L2 header
> of course. But my guess would be that once you have payed the tax of
> receiving the buffer via the NIC driver, allocated an skb, and called
> netif_rx() etc. the routing operation will be a rounding error. At
> least on smaller devices where the FIB is typically quite small.

Right, but in that case, there is less of an argument to have something
like DSA injecting directly into an upper device's RX path, if only
mv88e6xxx with bonding is ever going to use that.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-28 23:08           ` Vladimir Oltean
@ 2020-10-29  7:47             ` Tobias Waldekranz
  2020-10-30  9:21               ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-10-29  7:47 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: andrew, vivien.didelot, f.fainelli, netdev, Ido Schimmel

On Thu Oct 29, 2020 at 2:08 AM CET, Vladimir Oltean wrote:
> On Wed, Oct 28, 2020 at 11:31:58PM +0100, Tobias Waldekranz wrote:
> > The thing is, unlike L2 where the hardware will add new neighbors to
> > its FDB autonomously, every entry in the hardware FIB is under the
> > strict control of the CPU. So I think you can avoid much of this
> > headache simply by determining if a given L3 nexthop/neighbor is
> > "foreign" to the switch or not, and then just skip offloading for
> > those entries.
> > 
> > You miss out on the hardware acceleration of replacing the L2 header
> > of course. But my guess would be that once you have payed the tax of
> > receiving the buffer via the NIC driver, allocated an skb, and called
> > netif_rx() etc. the routing operation will be a rounding error. At
> > least on smaller devices where the FIB is typically quite small.
>
> Right, but in that case, there is less of an argument to have something
> like DSA injecting directly into an upper device's RX path, if only
> mv88e6xxx with bonding is ever going to use that.

Doesn't that basically boil down to the argument that "we can't merge
this change because it's never going to be used, except for when it is
used"? I don't know if I buy that.

How about the inverse question: If this change is not acceptable, do
you have any other suggestion on to solve it? The hardware is what it
is, I can not will the source port information into existence, and
injecting packets on the wrong DSA port feels even more dirty to me.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-29  7:47             ` Tobias Waldekranz
@ 2020-10-30  9:21               ` Vladimir Oltean
  0 siblings, 0 replies; 43+ messages in thread
From: Vladimir Oltean @ 2020-10-30  9:21 UTC (permalink / raw)
  To: Tobias Waldekranz
  Cc: andrew, vivien.didelot, f.fainelli, netdev, Ido Schimmel

On Thu, Oct 29, 2020 at 08:47:17AM +0100, Tobias Waldekranz wrote:
> Doesn't that basically boil down to the argument that "we can't merge
> this change because it's never going to be used, except for when it is
> used"? I don't know if I buy that.
> 
> How about the inverse question: If this change is not acceptable, do
> you have any other suggestion on to solve it? The hardware is what it
> is, I can not will the source port information into existence, and
> injecting packets on the wrong DSA port feels even more dirty to me.

I suppose you're right, I don't have any better suggestion.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices
  2020-10-28 18:18       ` Vladimir Oltean
  2020-10-28 22:31         ` Tobias Waldekranz
@ 2020-11-01 11:31         ` Ido Schimmel
  1 sibling, 0 replies; 43+ messages in thread
From: Ido Schimmel @ 2020-11-01 11:31 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Tobias Waldekranz, andrew, vivien.didelot, f.fainelli, netdev

On Wed, Oct 28, 2020 at 08:18:24PM +0200, Vladimir Oltean wrote:
> Yes, I expect that the bridge input would need to have one more entry
> path into it than just br_handle_frame.
> 
> I'm a bit confused and undecided right now, so let's look at it from a
> different perspective. Let's imagine a switchdev driver (DSA or not)
> which is able to offload IP forwarding. There are some interfaces that
> are bridged and one that is standalone. The setup looks as below.
> 
>  IP interfaces
>                 +---------------------------------------------------------+
>                 |                           br0                           |
>                 +---------------------------------------------------------+
> 
>  +------------+ +------------+ +------------+ +------------+ +------------+
>  |    swp0    | |    swp1    | |    swp2    | |    swp3    | |    eth0    |
>  +------------+ +------------+ +------------+ +------------+ +------------+
> 
>  Hardware interfaces
> 
>  +------------+ +------------+ +------------+ +------------+ +------------+
>  | DSA port 0 | | DSA port 1 | | DSA port 2 | | DSA port 3 | |   e1000    |
>  +------------+ +------------+ +------------+ +------------+ +------------+
> 
> Let's say you receive a packet on the standalone swp0, and you need to
> perform IP routing towards the bridged domain br0. Some switchdev/DSA
> ports are bridged and some aren't.
> 
> The switchdev/DSA switch will attempt to do the IP routing step first,
> and it _can_ do that because it is aware of the br0 interface, so it
> will decrement the TTL and replace the L2 header.
> 
> At this stage we have a modified IP packet, which corresponds with what
> should be injected into the hardware's view of the br0 interface. The
> packet is still in the switchdev/DSA hardware data path.
> 
> But then, the switchdev/DSA hardware will look up the FDB in the name of
> br0, in an attempt of finding the destination port for the packet. But
> the packet should be delivered to a station connected to eth0 (e1000,
> foreign interface). So that's part of the exception path, the packet
> should be delivered to the CPU.
> 
> But the packet was already modified by the hardware data path (IP
> forwarding has already taken place)! So how should the DSA/switchdev
> hardware deliver the packet to the CPU? It has 2 options:
> 
> (a) unwind the entire packet modification, cancel the IP forwarding and
>     deliver the unmodified packet to the CPU on behalf of swp0, the
>     ingress port. Then let software IP forwarding plus software bridging
>     deal with it, so that it can reach the e1000.

This is what happens in the Spectrum ASICs. If a packet hits some
exception in the data path, it is trapped from the Rx port unmodified.

> (b) deliver the packet to the CPU in the middle of the hardware
>     forwarding data path, where the exception/miss occurred, aka deliver
>     it on behalf of br0. Modified by IP forwarding. This is where we'd
>     have to manually inject skb->dev into br0 somehow.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
                   ` (7 preceding siblings ...)
  2020-10-27 22:36 ` Andrew Lunn
@ 2020-11-11  4:28 ` Florian Fainelli
  2020-11-19 10:51 ` Vladimir Oltean
  9 siblings, 0 replies; 43+ messages in thread
From: Florian Fainelli @ 2020-11-11  4:28 UTC (permalink / raw)
  To: Tobias Waldekranz, andrew, vivien.didelot, olteanv; +Cc: netdev



On 10/27/2020 3:51 AM, Tobias Waldekranz wrote:
> This series starts by adding the generic support required to offload
> link aggregates to drivers built on top of the DSA subsystem. It then
> implements offloading for the mv88e6xxx driver, i.e. Marvell's
> LinkStreet family.
> 
> Posting this as an RFC as there are some topics that I would like
> feedback on before going further with testing. Thus far I've done some
> basic tests to verify that:
> 
> - A LAG can be used as a regular interface.
> - Bridging a LAG with other DSA ports works as expected.
> - Load balancing is done by the hardware, both in single- and
>   multi-chip systems.
> - Load balancing is dynamically reconfigured when the state of
>   individual links change.
> 
> Testing as been done on two systems:
> 
> 1. Single-chip system with one Peridot (6390X).
> 2. Multi-chip system with one Agate (6352) daisy-chained with an Opal
>    (6097F).
> 
> I would really appreciate feedback on the following:
> 
> All LAG configuration is cached in `struct dsa_lag`s. I realize that
> the standard M.O. of DSA is to read back information from hardware
> when required. With LAGs this becomes very tricky though. For example,
> the change of a link state on one switch will require re-balancing of
> LAG hash buckets on another one, which in turn depends on the total
> number of active links in the LAG. Do you agree that this is
> motivated?

Yes, this makes sense, I did something quite similar in this branch
nearly 3 years ago, it was tested to the point where the switch was
programmed correctly but I had not configured the CPU port to support
2Gbits/sec (doh) to verify that we got 2x the desired throughput:

https://github.com/ffainelli/linux/commits/b53-bond

Your patch looks definitively more complete.

> 
> The LAG driver ops all receive the LAG netdev as an argument when this
> information is already available through the port's lag pointer. This
> was done to match the way that the bridge netdev is passed to all VLAN
> ops even though it is in the port's bridge_dev. Is there a reason for
> this or should I just remove it from the LAG ops?
> 
> At least on mv88e6xxx, the exact source port is not available when
> packets are received on the CPU. The way I see it, there are two ways
> around that problem:
> 
> - Inject the packet directly on the LAG device (what this series
>   does). Feels right because it matches all that we actually know; the
>   packet came in on the LAG. It does complicate dsa_switch_rcv
>   somewhat as we can no longer assume that skb->dev is a DSA port.
> 
> - Inject the packet on "the designated port", i.e. some port in the
>   LAG. This lets us keep the current Rx path untouched. The problem is
>   that (a) the port would have to be dynamically updated to match the
>   expectations of the LAG driver (team/bond) as links are
>   enabled/disabled and (b) we would be presenting a lie because
>   packets would appear to ingress on netdevs that they might not in
>   fact have been physically received on.
> 
> (mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
> seems like all chips capable of doing EDSA are using that, except for
> the Peridot.
> 
> (mv88e6xxx) The cross-chip PVT changes required to allow a LAG to
> communicate with the other ports do not feel quite right, but I'm
> unsure about what the proper way of doing it would be. Any ideas?
> 
> (mv88e6xxx) Marvell has historically used the idiosyncratic term
> "trunk" to refer to link aggregates. Somewhere around the Peridot they
> have switched and are now referring to the same registers/tables using
> the term "LAG". In this series I've stuck to using LAG for all generic
> stuff, and only used trunk for driver-internal functions. Do we want
> to rename everything to use the LAG nomenclature?

Yes please!
-- 
Florian

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
                   ` (8 preceding siblings ...)
  2020-11-11  4:28 ` Florian Fainelli
@ 2020-11-19 10:51 ` Vladimir Oltean
  2020-11-19 11:52   ` Tobias Waldekranz
  9 siblings, 1 reply; 43+ messages in thread
From: Vladimir Oltean @ 2020-11-19 10:51 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, netdev

On Tue, Oct 27, 2020 at 11:51:13AM +0100, Tobias Waldekranz wrote:
> This series starts by adding the generic support required to offload
> link aggregates to drivers built on top of the DSA subsystem. It then
> implements offloading for the mv88e6xxx driver, i.e. Marvell's
> LinkStreet family.
>
> Posting this as an RFC as there are some topics that I would like
> feedback on before going further with testing. Thus far I've done some
> basic tests to verify that:
>
> - A LAG can be used as a regular interface.
> - Bridging a LAG with other DSA ports works as expected.
> - Load balancing is done by the hardware, both in single- and
>   multi-chip systems.
> - Load balancing is dynamically reconfigured when the state of
>   individual links change.
>
> Testing as been done on two systems:
>
> 1. Single-chip system with one Peridot (6390X).
> 2. Multi-chip system with one Agate (6352) daisy-chained with an Opal
>    (6097F).
>
> I would really appreciate feedback on the following:
>
> All LAG configuration is cached in `struct dsa_lag`s. I realize that
> the standard M.O. of DSA is to read back information from hardware
> when required. With LAGs this becomes very tricky though. For example,
> the change of a link state on one switch will require re-balancing of
> LAG hash buckets on another one, which in turn depends on the total
> number of active links in the LAG. Do you agree that this is
> motivated?
>
> The LAG driver ops all receive the LAG netdev as an argument when this
> information is already available through the port's lag pointer. This
> was done to match the way that the bridge netdev is passed to all VLAN
> ops even though it is in the port's bridge_dev. Is there a reason for
> this or should I just remove it from the LAG ops?
>
> At least on mv88e6xxx, the exact source port is not available when
> packets are received on the CPU. The way I see it, there are two ways
> around that problem:
>
> - Inject the packet directly on the LAG device (what this series
>   does). Feels right because it matches all that we actually know; the
>   packet came in on the LAG. It does complicate dsa_switch_rcv
>   somewhat as we can no longer assume that skb->dev is a DSA port.
>
> - Inject the packet on "the designated port", i.e. some port in the
>   LAG. This lets us keep the current Rx path untouched. The problem is
>   that (a) the port would have to be dynamically updated to match the
>   expectations of the LAG driver (team/bond) as links are
>   enabled/disabled and (b) we would be presenting a lie because
>   packets would appear to ingress on netdevs that they might not in
>   fact have been physically received on.
>
> (mv88e6xxx) What is the policy regarding the use of DSA vs. EDSA?  It
> seems like all chips capable of doing EDSA are using that, except for
> the Peridot.
>
> (mv88e6xxx) The cross-chip PVT changes required to allow a LAG to
> communicate with the other ports do not feel quite right, but I'm
> unsure about what the proper way of doing it would be. Any ideas?
>
> (mv88e6xxx) Marvell has historically used the idiosyncratic term
> "trunk" to refer to link aggregates. Somewhere around the Peridot they
> have switched and are now referring to the same registers/tables using
> the term "LAG". In this series I've stuck to using LAG for all generic
> stuff, and only used trunk for driver-internal functions. Do we want
> to rename everything to use the LAG nomenclature?

I have tested these patches on ocelot/felix and all is OK there, I would
appreciate if you could resend as non-RFC. In the case of my hardware,
it appears that I don't need the .port_lag_change callback, and that the
source port that is being put in the DSA header is still the physical
port ID and not the logical port ID (the LAG number). That being said,
the framework you've built is clearly nice and works well even with
bridging a bond.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-11-19 10:51 ` Vladimir Oltean
@ 2020-11-19 11:52   ` Tobias Waldekranz
  2020-11-19 18:12     ` Vladimir Oltean
  0 siblings, 1 reply; 43+ messages in thread
From: Tobias Waldekranz @ 2020-11-19 11:52 UTC (permalink / raw)
  To: Vladimir Oltean; +Cc: andrew, vivien.didelot, f.fainelli, netdev

On Thu Nov 19, 2020 at 1:51 PM CET, Vladimir Oltean wrote:
> I have tested these patches on ocelot/felix and all is OK there, I would
> appreciate if you could resend as non-RFC. In the case of my hardware,

For sure, I am working on it as we speak. I was mostly waiting for the
dsa-tag-unification series to make its way to net-next first as v1
depends on that. But then I remembered that I had to test against the
bonding driver (I have used team up to this point), and there is some
bug there that I need to squash first.

> it appears that I don't need the .port_lag_change callback, and that the

Ok, does ocelot automatically rebalance the LAG based on link state? I
took a quick look through the datasheet for another switch from
Vitesse, and it explicitly states that you need to update a table on
link changes.

I.e. in this situation:

    br0
   /  |
 lag  |
 /|\  |
1 2 3 4
| | |  \
| | |   B
| | |
1 2 3
  A

If you unplug cable 1, does the hardware rebalance all flows between
A<->B to only use 2 and 3 without software assistance? If not, you
will loose 1/3 of your flows.

> source port that is being put in the DSA header is still the physical
> port ID and not the logical port ID (the LAG number). That being said,

Ok, yeah I really wish this was true for mv88e6xxx as well.

> the framework you've built is clearly nice and works well even with
> bridging a bond.

Thank you!

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [RFC PATCH 0/4] net: dsa: link aggregation support
  2020-11-19 11:52   ` Tobias Waldekranz
@ 2020-11-19 18:12     ` Vladimir Oltean
  0 siblings, 0 replies; 43+ messages in thread
From: Vladimir Oltean @ 2020-11-19 18:12 UTC (permalink / raw)
  To: Tobias Waldekranz; +Cc: andrew, vivien.didelot, f.fainelli, netdev

On Thu, Nov 19, 2020 at 12:52:14PM +0100, Tobias Waldekranz wrote:
> > it appears that I don't need the .port_lag_change callback, and that the
>
> Ok, does ocelot automatically rebalance the LAG based on link state? I
> took a quick look through the datasheet for another switch from
> Vitesse, and it explicitly states that you need to update a table on
> link changes.
>
> I.e. in this situation:
>
>     br0
>    /  |
>  lag  |
>  /|\  |
> 1 2 3 4
> | | |  \
> | | |   B
> | | |
> 1 2 3
>   A
>
> If you unplug cable 1, does the hardware rebalance all flows between
> A<->B to only use 2 and 3 without software assistance? If not, you
> will loose 1/3 of your flows.

Yes, you're right, the switch doesn't rebalance the aggregation codes
across the remaining ports automatically. In my mind I was subconsiously
hoping that would be the case, because I need to make use of the
information in struct dsa_lag in non-DSA code (long story, but the
drivers/net/dsa/ocelot code shares the implementation with
drivers/net/ethernet/mscc/ocelot* which is a switchdev-only driver).
It doesn't mean that keeping state in dp->lag is the wrong thing to do,
it's just that this is an extra challenge for an already odd driver,
that I will have to see how I deal with :D

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2020-11-19 18:13 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-27 10:51 [RFC PATCH 0/4] net: dsa: link aggregation support Tobias Waldekranz
2020-10-27 10:51 ` [RFC PATCH 1/4] net: dsa: mv88e6xxx: use ethertyped dsa for 6390/6390X Tobias Waldekranz
2020-10-27 14:52   ` Marek Behun
2020-10-27 14:54     ` Marek Behun
2020-10-27 14:58       ` Marek Behun
2020-10-27 10:51 ` [RFC PATCH 2/4] net: dsa: link aggregation support Tobias Waldekranz
2020-10-28  0:58   ` Vladimir Oltean
2020-10-28 14:03     ` Tobias Waldekranz
2020-10-27 10:51 ` [RFC PATCH 3/4] net: dsa: mv88e6xxx: " Tobias Waldekranz
2020-10-27 10:51 ` [RFC PATCH 4/4] net: dsa: tag_edsa: support reception of packets from lag devices Tobias Waldekranz
2020-10-28 12:05   ` Vladimir Oltean
2020-10-28 15:28     ` Tobias Waldekranz
2020-10-28 18:18       ` Vladimir Oltean
2020-10-28 22:31         ` Tobias Waldekranz
2020-10-28 23:08           ` Vladimir Oltean
2020-10-29  7:47             ` Tobias Waldekranz
2020-10-30  9:21               ` Vladimir Oltean
2020-11-01 11:31         ` Ido Schimmel
2020-10-27 12:27 ` [RFC PATCH 0/4] net: dsa: link aggregation support Vladimir Oltean
2020-10-27 14:29   ` Andrew Lunn
2020-10-27 14:59   ` Tobias Waldekranz
2020-10-27 14:00 ` Andrew Lunn
2020-10-27 15:09   ` Tobias Waldekranz
2020-10-27 15:05 ` Marek Behun
2020-10-27 15:23   ` Andrew Lunn
2020-10-27 18:25     ` Tobias Waldekranz
2020-10-27 18:33       ` Marek Behun
2020-10-27 19:04         ` Vladimir Oltean
2020-10-27 19:21           ` Tobias Waldekranz
2020-10-27 19:00       ` Vladimir Oltean
2020-10-27 19:37         ` Tobias Waldekranz
2020-10-27 20:02           ` Vladimir Oltean
2020-10-27 20:53             ` Tobias Waldekranz
2020-10-27 22:32               ` Vladimir Oltean
2020-10-28  0:27                 ` Tobias Waldekranz
2020-10-28 22:35       ` Marek Behun
2020-10-27 22:36 ` Andrew Lunn
2020-10-28  0:45   ` Tobias Waldekranz
2020-10-28  1:03     ` Andrew Lunn
2020-11-11  4:28 ` Florian Fainelli
2020-11-19 10:51 ` Vladimir Oltean
2020-11-19 11:52   ` Tobias Waldekranz
2020-11-19 18:12     ` Vladimir Oltean

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).