Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 net-next 00/10] Configure the MTU on DSA switches
@ 2020-03-25 15:21 Vladimir Oltean
  2020-03-25 15:22 ` [PATCH v2 net-next 01/10] net: dsa: configure the MTU for switch ports Vladimir Oltean
                   ` (9 more replies)
  0 siblings, 10 replies; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:21 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Vladimir Oltean <vladimir.oltean@nxp.com>

This series adds support for configuring the MTU on front-panel switch
ports, while seamlessly adapting the CPU port and the DSA master to the
largest value plus the tagger overhead.

It also implements bridge MTU auto-normalization, as discussed with
Florian in the comments of v1.

Support was added for quite a number of switches, in the hope that this
series would gain some traction:
 - sja1105
 - felix
 - vsc73xx
 - b53 and rest of the platform

V1 of this series was submitted here:
https://patchwork.ozlabs.org/cover/1199868/

Murali Krishna Policharla (5):
  net: phy: bcm7xx: Add jumbo frame configuration to PHY
  bgmac: Add support for Jumbo frames
  bgmac: Add MTU configuration support to the driver
  bgmac: Add DMA support to handle frames beyond 8192 bytes
  net: dsa: b53: Add MTU configuration support

Vladimir Oltean (5):
  net: dsa: configure the MTU for switch ports
  net: dsa: sja1105: Implement the port MTU callbacks
  net: dsa: vsc73xx: Make the MTU configurable
  net: dsa: felix: support changing the MTU
  net: bridge: implement auto-normalization of MTU for hardware datapath

 drivers/net/dsa/b53/b53_common.c       |  35 +++++++++
 drivers/net/dsa/ocelot/felix.c         |  18 +++++
 drivers/net/dsa/sja1105/sja1105.h      |   1 +
 drivers/net/dsa/sja1105/sja1105_main.c |  48 +++++++++++-
 drivers/net/dsa/vitesse-vsc73xx-core.c |  30 ++++---
 drivers/net/ethernet/broadcom/bgmac.c  |  12 +++
 drivers/net/ethernet/broadcom/bgmac.h  |   5 +-
 drivers/net/ethernet/mscc/ocelot.c     |  45 ++++++++---
 drivers/net/phy/bcm-phy-lib.c          |  28 +++++++
 drivers/net/phy/bcm-phy-lib.h          |   1 +
 drivers/net/phy/bcm7xxx.c              |   4 +
 include/linux/brcmphy.h                |   1 +
 include/net/dsa.h                      |  10 +++
 include/soc/mscc/ocelot.h              |   7 ++
 net/bridge/br.c                        |   1 +
 net/bridge/br_if.c                     |  93 ++++++++++++++++++++++
 net/bridge/br_private.h                |   1 +
 net/dsa/dsa_priv.h                     |  10 +++
 net/dsa/master.c                       |  14 ++--
 net/dsa/port.c                         |  11 +++
 net/dsa/slave.c                        | 104 ++++++++++++++++++++++++-
 net/dsa/switch.c                       |  34 ++++++++
 22 files changed, 478 insertions(+), 35 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 01/10] net: dsa: configure the MTU for switch ports
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 15:22 ` [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY Vladimir Oltean
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Vladimir Oltean <vladimir.oltean@nxp.com>

It is useful be able to configure port policers on a switch to accept
frames of various sizes:

- Increase the MTU for better throughput from the default of 1500 if it
  is known that there is no 10/100 Mbps device in the network.
- Decrease the MTU to limit the latency of high-priority frames under
  congestion.

For DSA slave ports, this is mostly a pass-through callback, called
through the regular ndo ops and at probe time (to ensure consistency
across all supported switches).

The CPU port is called with an MTU equal to the largest configured MTU
of the slave ports. The assumption is that the user might want to
sustain a bidirectional conversation with a partner over any switch
port.

The DSA master is configured the same as the CPU port, plus the tagger
overhead. Since the MTU is by definition L2 payload (sans Ethernet
header), it is up to each individual driver to figure out if it needs to
do anything special for its frame tags on the CPU port (it shouldn't
except in special cases). So the MTU does not contain the tagger
overhead on the CPU port.
However the MTU of the DSA master, minus the tagger overhead, is used as
a proxy for the MTU of the CPU port, which does not have a net device.
This is to avoid uselessly calling the .change_mtu function on the CPU
port when nothing should change.

So it is safe to assume that the DSA master and the CPU port MTUs are
apart by exactly the tagger's overhead in bytes.

This patch also makes MTU configuration on the DSA master interface
mandatory (errors from dsa_master_set_mtu now really are propagated).
The reasoning is that it's better to be safe than sorry.  With cascaded
DSA setups, things can easily spiral out of control if they're ignored,
since the user is not notified of what went wrong, and large packets
just get lost.

Some inspiration (mainly in the MTU DSA notifier) was taken from a
vaguely similar patch from Murali and Florian, who are credited as
co-developers down below.

Co-developed-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Co-developed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 include/net/dsa.h  |  10 +++++
 net/dsa/dsa_priv.h |  10 +++++
 net/dsa/master.c   |  14 +++---
 net/dsa/port.c     |  11 +++++
 net/dsa/slave.c    | 104 ++++++++++++++++++++++++++++++++++++++++++++-
 net/dsa/switch.c   |  34 +++++++++++++++
 6 files changed, 174 insertions(+), 9 deletions(-)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index beeb81a532e3..1bb1e0852e31 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -579,6 +579,16 @@ struct dsa_switch_ops {
 				     struct devlink_param_gset_ctx *ctx);
 	int	(*devlink_param_set)(struct dsa_switch *ds, u32 id,
 				     struct devlink_param_gset_ctx *ctx);
+
+	/*
+	 * MTU change functionality. Switches can also adjust their MRU through
+	 * this method. By MTU, one understands the SDU, aka L2 payload length.
+	 * If the switch needs to account for the DSA tag on the CPU port, this
+	 * method needs to to do so privately.
+	 */
+	int	(*port_change_mtu)(struct dsa_switch *ds, int port,
+				   int new_mtu);
+	int	(*port_max_mtu)(struct dsa_switch *ds, int port);
 };
 
 #define DSA_DEVLINK_PARAM_DRIVER(_id, _name, _type, _cmodes)		\
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 760e6ea3178a..b43d5713ff90 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -22,6 +22,7 @@ enum {
 	DSA_NOTIFIER_MDB_DEL,
 	DSA_NOTIFIER_VLAN_ADD,
 	DSA_NOTIFIER_VLAN_DEL,
+	DSA_NOTIFIER_MTU,
 };
 
 /* DSA_NOTIFIER_AGEING_TIME */
@@ -61,6 +62,13 @@ struct dsa_notifier_vlan_info {
 	int port;
 };
 
+/* DSA_NOTIFIER_MTU */
+struct dsa_notifier_mtu_info {
+	int sw_index;
+	int port;
+	int mtu;
+};
+
 struct dsa_slave_priv {
 	/* Copy of CPU port xmit for faster access in slave transmit hot path */
 	struct sk_buff *	(*xmit)(struct sk_buff *skb,
@@ -98,6 +106,7 @@ int dsa_legacy_fdb_del(struct ndmsg *ndm, struct nlattr *tb[],
 /* master.c */
 int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp);
 void dsa_master_teardown(struct net_device *dev);
+int dsa_master_set_mtu(struct net_device *dev, int mtu);
 
 static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
 						       int device, int port)
@@ -127,6 +136,7 @@ int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
 			    struct switchdev_trans *trans);
 int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock,
 			 struct switchdev_trans *trans);
+int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu);
 int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr,
 		     u16 vid);
 int dsa_port_fdb_del(struct dsa_port *dp, const unsigned char *addr,
diff --git a/net/dsa/master.c b/net/dsa/master.c
index bd44bde272f4..e37f14d6a8a6 100644
--- a/net/dsa/master.c
+++ b/net/dsa/master.c
@@ -314,18 +314,18 @@ static const struct attribute_group dsa_group = {
 	.attrs	= dsa_slave_attrs,
 };
 
-static void dsa_master_set_mtu(struct net_device *dev, struct dsa_port *cpu_dp)
+/* Needs to be called under rtnl_lock */
+int dsa_master_set_mtu(struct net_device *dev, int mtu)
 {
-	unsigned int mtu = ETH_DATA_LEN + cpu_dp->tag_ops->overhead;
-	int err;
+	int err = -ERANGE;
 
-	rtnl_lock();
 	if (mtu <= dev->max_mtu) {
 		err = dev_set_mtu(dev, mtu);
 		if (err)
-			netdev_dbg(dev, "Unable to set MTU to include for DSA overheads\n");
+			netdev_err(dev, "Unable to set MTU to include for DSA overheads\n");
 	}
-	rtnl_unlock();
+
+	return err;
 }
 
 static void dsa_master_reset_mtu(struct net_device *dev)
@@ -344,8 +344,6 @@ int dsa_master_setup(struct net_device *dev, struct dsa_port *cpu_dp)
 {
 	int ret;
 
-	dsa_master_set_mtu(dev,  cpu_dp);
-
 	/* If we use a tagging format that doesn't have an ethertype
 	 * field, make sure that all packets from this point on get
 	 * sent to the tag format's receive function.
diff --git a/net/dsa/port.c b/net/dsa/port.c
index a18e65a474a5..428d02d3c2e7 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -297,6 +297,17 @@ int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
 	return ds->ops->port_egress_floods(ds, port, true, mrouter);
 }
 
+int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu)
+{
+	struct dsa_notifier_mtu_info info = {
+		.sw_index = dp->ds->index,
+		.port = dp->index,
+		.mtu = new_mtu,
+	};
+
+	return dsa_port_notify(dp, DSA_NOTIFIER_MTU, &info);
+}
+
 int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr,
 		     u16 vid)
 {
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 5f782fa3029f..8d8fc20ce4c6 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1218,6 +1218,100 @@ static int dsa_slave_vlan_rx_kill_vid(struct net_device *dev, __be16 proto,
 	return dsa_port_vid_del(dp, vid);
 }
 
+static int dsa_slave_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct net_device *master = dsa_slave_to_master(dev);
+	struct dsa_port *dp = dsa_slave_to_port(dev);
+	struct dsa_slave_priv *p = netdev_priv(dev);
+	struct dsa_switch *ds = p->dp->ds;
+	struct dsa_port *cpu_dp;
+	int port = p->dp->index;
+	int largest_mtu = 0;
+	int new_master_mtu;
+	int old_master_mtu;
+	int mtu_limit;
+	int cpu_mtu;
+	int err, i;
+
+	if (!ds->ops->port_change_mtu)
+		return -EOPNOTSUPP;
+
+	for (i = 0; i < ds->num_ports; i++) {
+		int slave_mtu;
+
+		if (!dsa_is_user_port(ds, i))
+			continue;
+
+		/* During probe, this function will be called for each slave
+		 * device, while not all of them have been allocated. That's
+		 * ok, it doesn't change what the maximum is, so ignore it.
+		 */
+		if (!dsa_to_port(ds, i)->slave)
+			continue;
+
+		/* Pretend that we already applied the setting, which we
+		 * actually haven't (still haven't done all integrity checks)
+		 */
+		if (i == port)
+			slave_mtu = new_mtu;
+		else
+			slave_mtu = dsa_to_port(ds, i)->slave->mtu;
+
+		if (largest_mtu < slave_mtu)
+			largest_mtu = slave_mtu;
+	}
+
+	cpu_dp = dsa_to_port(ds, port)->cpu_dp;
+
+	mtu_limit = min_t(int, master->max_mtu, dev->max_mtu);
+	old_master_mtu = master->mtu;
+	new_master_mtu = largest_mtu + cpu_dp->tag_ops->overhead;
+	if (new_master_mtu > mtu_limit)
+		return -ERANGE;
+
+	/* If the master MTU isn't over limit, there's no need to check the CPU
+	 * MTU, since that surely isn't either.
+	 */
+	cpu_mtu = largest_mtu;
+
+	/* Start applying stuff */
+	if (new_master_mtu != old_master_mtu) {
+		err = dsa_master_set_mtu(master, new_master_mtu);
+		if (err < 0)
+			goto out_master_failed;
+
+		err = dsa_port_mtu_change(cpu_dp, cpu_mtu);
+		if (err) {
+			netdev_err(dev, "Failed to change MTU on CPU port\n");
+			goto out_cpu_failed;
+		}
+	}
+
+	err = dsa_port_mtu_change(dp, new_mtu);
+	if (err) {
+		netdev_err(dev, "Failed to change MTU\n");
+		goto out_port_failed;
+	}
+
+	dev->mtu = new_mtu;
+
+	return 0;
+
+out_port_failed:
+	if (new_master_mtu != old_master_mtu) {
+		if (dsa_port_mtu_change(cpu_dp, old_master_mtu -
+					cpu_dp->tag_ops->overhead))
+			dev_err(ds->dev, "Failed to restore MTU on CPU port\n");
+	}
+out_cpu_failed:
+	if (new_master_mtu != old_master_mtu) {
+		if (dsa_master_set_mtu(master, old_master_mtu))
+			netdev_err(master, "Failed to restore MTU\n");
+	}
+out_master_failed:
+	return err;
+}
+
 static const struct ethtool_ops dsa_slave_ethtool_ops = {
 	.get_drvinfo		= dsa_slave_get_drvinfo,
 	.get_regs_len		= dsa_slave_get_regs_len,
@@ -1295,6 +1389,7 @@ static const struct net_device_ops dsa_slave_netdev_ops = {
 	.ndo_vlan_rx_add_vid	= dsa_slave_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid	= dsa_slave_vlan_rx_kill_vid,
 	.ndo_get_devlink_port	= dsa_slave_get_devlink_port,
+	.ndo_change_mtu		= dsa_slave_change_mtu,
 };
 
 static struct device_type dsa_type = {
@@ -1465,7 +1560,10 @@ int dsa_slave_create(struct dsa_port *port)
 	slave_dev->priv_flags |= IFF_NO_QUEUE;
 	slave_dev->netdev_ops = &dsa_slave_netdev_ops;
 	slave_dev->min_mtu = 0;
-	slave_dev->max_mtu = ETH_MAX_MTU;
+	if (ds->ops->port_max_mtu)
+		slave_dev->max_mtu = ds->ops->port_max_mtu(ds, port->index);
+	else
+		slave_dev->max_mtu = ETH_MAX_MTU;
 	SET_NETDEV_DEVTYPE(slave_dev, &dsa_type);
 
 	SET_NETDEV_DEV(slave_dev, port->ds->dev);
@@ -1483,6 +1581,10 @@ int dsa_slave_create(struct dsa_port *port)
 	p->xmit = cpu_dp->tag_ops->xmit;
 	port->slave = slave_dev;
 
+	rtnl_lock();
+	dsa_slave_change_mtu(slave_dev, ETH_DATA_LEN);
+	rtnl_unlock();
+
 	netif_carrier_off(slave_dev);
 
 	ret = dsa_slave_phy_setup(slave_dev);
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index df4abe897ed6..6b0b6f9d219c 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -52,6 +52,37 @@ static int dsa_switch_ageing_time(struct dsa_switch *ds,
 	return 0;
 }
 
+static bool dsa_switch_mtu_match(struct dsa_switch *ds, int port,
+				 struct dsa_notifier_mtu_info *info)
+{
+	if (ds->index == info->sw_index && port == info->port)
+		return true;
+
+	if (dsa_is_dsa_port(ds, port) || dsa_is_cpu_port(ds, port))
+		return true;
+
+	return false;
+}
+
+static int dsa_switch_mtu(struct dsa_switch *ds,
+			  struct dsa_notifier_mtu_info *info)
+{
+	int port, ret;
+
+	if (!ds->ops->port_change_mtu)
+		return -EOPNOTSUPP;
+
+	for (port = 0; port < ds->num_ports; port++) {
+		if (dsa_switch_mtu_match(ds, port, info)) {
+			ret = ds->ops->port_change_mtu(ds, port, info->mtu);
+			if (ret)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
 static int dsa_switch_bridge_join(struct dsa_switch *ds,
 				  struct dsa_notifier_bridge_info *info)
 {
@@ -328,6 +359,9 @@ static int dsa_switch_event(struct notifier_block *nb,
 	case DSA_NOTIFIER_VLAN_DEL:
 		err = dsa_switch_vlan_del(ds, info);
 		break;
+	case DSA_NOTIFIER_MTU:
+		err = dsa_switch_mtu(ds, info);
+		break;
 	default:
 		err = -EOPNOTSUPP;
 		break;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
  2020-03-25 15:22 ` [PATCH v2 net-next 01/10] net: dsa: configure the MTU for switch ports Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 15:44   ` Heiner Kallweit
  2020-03-25 15:22 ` [PATCH v2 net-next 03/10] bgmac: Add support for Jumbo frames Vladimir Oltean
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Murali Krishna Policharla <murali.policharla@broadcom.com>

Add API to configure jumbo frame settings in PHY during initial PHY
configuration.

Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/phy/bcm-phy-lib.c | 28 ++++++++++++++++++++++++++++
 drivers/net/phy/bcm-phy-lib.h |  1 +
 drivers/net/phy/bcm7xxx.c     |  4 ++++
 include/linux/brcmphy.h       |  1 +
 4 files changed, 34 insertions(+)

diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
index e0d3310957ff..a26c80e13b43 100644
--- a/drivers/net/phy/bcm-phy-lib.c
+++ b/drivers/net/phy/bcm-phy-lib.c
@@ -423,6 +423,34 @@ int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev)
 }
 EXPORT_SYMBOL_GPL(bcm_phy_28nm_a0b0_afe_config_init);
 
+int bcm_phy_enable_jumbo(struct phy_device *phydev)
+{
+	int val = 0, ret = 0;
+
+	ret = phy_write(phydev, MII_BCM54XX_AUX_CTL,
+			MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
+	if (ret < 0)
+		return ret;
+
+	val = phy_read(phydev, MII_BCM54XX_AUX_CTL);
+
+	/* Enable extended length packet reception */
+	val |= MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN;
+	ret = phy_write(phydev, MII_BCM54XX_AUX_CTL, val);
+
+	if (ret < 0)
+		return ret;
+
+	val = phy_read(phydev, MII_BCM54XX_ECR);
+
+	/* Enable 10K byte packet length reception */
+	val |= BIT(0);
+	ret =  phy_write(phydev, MII_BCM54XX_ECR, val);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(bcm_phy_enable_jumbo);
+
 MODULE_DESCRIPTION("Broadcom PHY Library");
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Broadcom Corporation");
diff --git a/drivers/net/phy/bcm-phy-lib.h b/drivers/net/phy/bcm-phy-lib.h
index c86fb9d1240c..129df819be8c 100644
--- a/drivers/net/phy/bcm-phy-lib.h
+++ b/drivers/net/phy/bcm-phy-lib.h
@@ -65,5 +65,6 @@ void bcm_phy_get_stats(struct phy_device *phydev, u64 *shadow,
 		       struct ethtool_stats *stats, u64 *data);
 void bcm_phy_r_rc_cal_reset(struct phy_device *phydev);
 int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev);
+int bcm_phy_enable_jumbo(struct phy_device *phydev);
 
 #endif /* _LINUX_BCM_PHY_LIB_H */
diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
index af8eabe7a6d4..692048d86ab1 100644
--- a/drivers/net/phy/bcm7xxx.c
+++ b/drivers/net/phy/bcm7xxx.c
@@ -178,6 +178,10 @@ static int bcm7xxx_28nm_config_init(struct phy_device *phydev)
 		break;
 	}
 
+	if (ret)
+		return ret;
+
+	ret =  bcm_phy_enable_jumbo(phydev);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index b475e7f20d28..19bd86019e93 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -119,6 +119,7 @@
 #define MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL	0x00
 #define MII_BCM54XX_AUXCTL_ACTL_TX_6DB		0x0400
 #define MII_BCM54XX_AUXCTL_ACTL_SMDSP_ENA	0x0800
+#define MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN	0x4000
 
 #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC			0x07
 #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN	0x0010
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 03/10] bgmac: Add support for Jumbo frames
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
  2020-03-25 15:22 ` [PATCH v2 net-next 01/10] net: dsa: configure the MTU for switch ports Vladimir Oltean
  2020-03-25 15:22 ` [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 15:22 ` [PATCH v2 net-next 04/10] bgmac: Add MTU configuration support to the driver Vladimir Oltean
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Murali Krishna Policharla <murali.policharla@broadcom.com>

Max frame length updated to enable support for jumbo frames.

Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/ethernet/broadcom/bgmac.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.h b/drivers/net/ethernet/broadcom/bgmac.h
index 40d02fec2747..041ad069b5c8 100644
--- a/drivers/net/ethernet/broadcom/bgmac.h
+++ b/drivers/net/ethernet/broadcom/bgmac.h
@@ -366,7 +366,8 @@
 #define BGMAC_RX_FRAME_OFFSET			30		/* There are 2 unused bytes between header and real data */
 #define BGMAC_RX_BUF_OFFSET			(NET_SKB_PAD + NET_IP_ALIGN - \
 						 BGMAC_RX_FRAME_OFFSET)
-#define BGMAC_RX_MAX_FRAME_SIZE			1536		/* Copied from b44/tg3 */
+/* Jumbo frame size */
+#define BGMAC_RX_MAX_FRAME_SIZE			9720
 #define BGMAC_RX_BUF_SIZE			(BGMAC_RX_FRAME_OFFSET + BGMAC_RX_MAX_FRAME_SIZE)
 #define BGMAC_RX_ALLOC_SIZE			(SKB_DATA_ALIGN(BGMAC_RX_BUF_SIZE + BGMAC_RX_BUF_OFFSET) + \
 						 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 04/10] bgmac: Add MTU configuration support to the driver
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
                   ` (2 preceding siblings ...)
  2020-03-25 15:22 ` [PATCH v2 net-next 03/10] bgmac: Add support for Jumbo frames Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 15:22 ` [PATCH v2 net-next 05/10] bgmac: Add DMA support to handle frames beyond 8192 bytes Vladimir Oltean
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Murali Krishna Policharla <murali.policharla@broadcom.com>

Add bgmac_change_mtu API to configure new mtu settings in bgmac driver.

Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/ethernet/broadcom/bgmac.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index 1bb07a5d82c9..c530dff0353b 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1248,6 +1248,14 @@ static int bgmac_set_mac_address(struct net_device *net_dev, void *addr)
 	return 0;
 }
 
+static int bgmac_change_mtu(struct net_device *net_dev, int mtu)
+{
+	struct bgmac *bgmac = netdev_priv(net_dev);
+
+	bgmac_write(bgmac, BGMAC_RXMAX_LENGTH, 32 + mtu);
+	return 0;
+}
+
 static const struct net_device_ops bgmac_netdev_ops = {
 	.ndo_open		= bgmac_open,
 	.ndo_stop		= bgmac_stop,
@@ -1256,6 +1264,7 @@ static const struct net_device_ops bgmac_netdev_ops = {
 	.ndo_set_mac_address	= bgmac_set_mac_address,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_do_ioctl           = phy_do_ioctl_running,
+	.ndo_change_mtu		= bgmac_change_mtu,
 };
 
 /**************************************************
@@ -1529,6 +1538,7 @@ int bgmac_enet_probe(struct bgmac *bgmac)
 	net_dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 	net_dev->hw_features = net_dev->features;
 	net_dev->vlan_features = net_dev->features;
+	net_dev->max_mtu = BGMAC_RX_MAX_FRAME_SIZE;
 
 	err = register_netdev(bgmac->net_dev);
 	if (err) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 05/10] bgmac: Add DMA support to handle frames beyond 8192 bytes
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
                   ` (3 preceding siblings ...)
  2020-03-25 15:22 ` [PATCH v2 net-next 04/10] bgmac: Add MTU configuration support to the driver Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 23:07   ` Florian Fainelli
  2020-03-25 15:22 ` [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support Vladimir Oltean
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Murali Krishna Policharla <murali.policharla@broadcom.com>

Add DMA support in driver to handle jumbo frames beyond 8192 bytes.
Also update jumbo frame max size to include FCS, the DMA packet length
received includes FCS.

Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Reviewed-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/ethernet/broadcom/bgmac.c | 4 +++-
 drivers/net/ethernet/broadcom/bgmac.h | 6 +++---
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c b/drivers/net/ethernet/broadcom/bgmac.c
index c530dff0353b..98ec1b8a7d8e 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -1538,7 +1538,9 @@ int bgmac_enet_probe(struct bgmac *bgmac)
 	net_dev->features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 	net_dev->hw_features = net_dev->features;
 	net_dev->vlan_features = net_dev->features;
-	net_dev->max_mtu = BGMAC_RX_MAX_FRAME_SIZE;
+
+	/* Omit FCS from max MTU size */
+	net_dev->max_mtu = BGMAC_RX_MAX_FRAME_SIZE - ETH_FCS_LEN;
 
 	err = register_netdev(bgmac->net_dev);
 	if (err) {
diff --git a/drivers/net/ethernet/broadcom/bgmac.h b/drivers/net/ethernet/broadcom/bgmac.h
index 041ad069b5c8..351c598a3ec6 100644
--- a/drivers/net/ethernet/broadcom/bgmac.h
+++ b/drivers/net/ethernet/broadcom/bgmac.h
@@ -351,7 +351,7 @@
 #define BGMAC_DESC_CTL0_IOC			0x20000000	/* IRQ on complete */
 #define BGMAC_DESC_CTL0_EOF			0x40000000	/* End of frame */
 #define BGMAC_DESC_CTL0_SOF			0x80000000	/* Start of frame */
-#define BGMAC_DESC_CTL1_LEN			0x00001FFF
+#define BGMAC_DESC_CTL1_LEN			0x00003FFF
 
 #define BGMAC_PHY_NOREGS			BRCM_PSEUDO_PHY_ADDR
 #define BGMAC_PHY_MASK				0x1F
@@ -366,8 +366,8 @@
 #define BGMAC_RX_FRAME_OFFSET			30		/* There are 2 unused bytes between header and real data */
 #define BGMAC_RX_BUF_OFFSET			(NET_SKB_PAD + NET_IP_ALIGN - \
 						 BGMAC_RX_FRAME_OFFSET)
-/* Jumbo frame size */
-#define BGMAC_RX_MAX_FRAME_SIZE			9720
+/* Jumbo frame size with FCS */
+#define BGMAC_RX_MAX_FRAME_SIZE			9724
 #define BGMAC_RX_BUF_SIZE			(BGMAC_RX_FRAME_OFFSET + BGMAC_RX_MAX_FRAME_SIZE)
 #define BGMAC_RX_ALLOC_SIZE			(SKB_DATA_ALIGN(BGMAC_RX_BUF_SIZE + BGMAC_RX_BUF_OFFSET) + \
 						 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
                   ` (4 preceding siblings ...)
  2020-03-25 15:22 ` [PATCH v2 net-next 05/10] bgmac: Add DMA support to handle frames beyond 8192 bytes Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 23:21   ` Florian Fainelli
  2020-03-25 15:22 ` [PATCH v2 net-next 07/10] net: dsa: sja1105: Implement the port MTU callbacks Vladimir Oltean
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Murali Krishna Policharla <murali.policharla@broadcom.com>

Add b53_change_mtu API to configure mtu settings in B53 switch. Enable
jumbo frame support if configured mtu size is for jumbo frame. Also
configure BCM583XX devices to send and receive jumbo frames when ports
are configured with 10/100 Mbps speed.

Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/b53/b53_common.c | 35 ++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index ceafce446317..e32e05803800 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -644,6 +644,7 @@ EXPORT_SYMBOL(b53_brcm_hdr_setup);
 
 static void b53_enable_cpu_port(struct b53_device *dev, int port)
 {
+	u32 port_mask;
 	u8 port_ctrl;
 
 	/* BCM5325 CPU port is at 8 */
@@ -658,6 +659,14 @@ static void b53_enable_cpu_port(struct b53_device *dev, int port)
 	b53_brcm_hdr_setup(dev->ds, port);
 
 	b53_br_egress_floods(dev->ds, port, true, true);
+
+	b53_read32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, &port_mask);
+
+	if (dev->chip_id == BCM583XX_DEVICE_ID)
+		port_mask |= JPM_10_100_JUMBO_EN;
+
+	port_mask |= BIT(port);
+	b53_write32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, port_mask);
 }
 
 static void b53_enable_mib(struct b53_device *dev)
@@ -2065,6 +2074,30 @@ int b53_set_mac_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e)
 }
 EXPORT_SYMBOL(b53_set_mac_eee);
 
+static int b53_change_mtu(struct dsa_switch *ds, int port, int mtu)
+{
+	struct b53_device *dev = ds->priv;
+	u32 port_mask;
+
+	if (dev->chip_id == BCM58XX_DEVICE_ID ||
+	    is5325(dev) || is5365(dev))
+		return -EOPNOTSUPP;
+
+	b53_read32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, &port_mask);
+
+	if (mtu >= JMS_MIN_SIZE)
+		port_mask |= BIT(port);
+	else
+		port_mask &= ~BIT(port);
+
+	return b53_write32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, port_mask);
+}
+
+static int b53_get_max_mtu(struct dsa_switch *ds, int port)
+{
+	return JMS_MAX_SIZE;
+}
+
 static const struct dsa_switch_ops b53_switch_ops = {
 	.get_tag_protocol	= b53_get_tag_protocol,
 	.setup			= b53_setup,
@@ -2102,6 +2135,8 @@ static const struct dsa_switch_ops b53_switch_ops = {
 	.port_mdb_prepare	= b53_mdb_prepare,
 	.port_mdb_add		= b53_mdb_add,
 	.port_mdb_del		= b53_mdb_del,
+	.port_max_mtu		= b53_get_max_mtu,
+	.port_change_mtu	= b53_change_mtu,
 };
 
 struct b53_chip_data {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 07/10] net: dsa: sja1105: Implement the port MTU callbacks
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
                   ` (5 preceding siblings ...)
  2020-03-25 15:22 ` [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 23:08   ` Florian Fainelli
  2020-03-25 15:22 ` [PATCH v2 net-next 08/10] net: dsa: vsc73xx: Make the MTU configurable Vladimir Oltean
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Vladimir Oltean <vladimir.oltean@nxp.com>

On this switch, the frame length enforcements are performed by the
ingress policers. There are 2 types of those: regular L2 (also called
best-effort) and Virtual Link policers (an ARINC664/AFDX concept for
defining L2 streams with certain QoS abilities). To avoid future
confusion, I prefer to call the reset reason "Best-effort policers",
even though the VL policers are not yet supported.

We also need to change the setup of the initial static config, such that
DSA calls to .change_mtu (which are expensive) become no-ops and don't
reset the switch 5 times.

A driver-level decision is to unconditionally allow single VLAN-tagged
traffic on all ports. The CPU port must accept an additional VLAN header
for the DSA tag, which is again a driver-level decision.

The policers actually count bytes not only from the SDU, but also from
the Ethernet header and FCS, so those need to be accounted for as well.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/sja1105/sja1105.h      |  1 +
 drivers/net/dsa/sja1105/sja1105_main.c | 48 +++++++++++++++++++++++---
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index a358fc89a6db..0e5b739b2fe8 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -126,6 +126,7 @@ enum sja1105_reset_reason {
 	SJA1105_RX_HWTSTAMPING,
 	SJA1105_AGEING_TIME,
 	SJA1105_SCHEDULING,
+	SJA1105_BEST_EFFORT_POLICING,
 };
 
 int sja1105_static_config_reload(struct sja1105_private *priv,
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index e0c99bb63cdf..5f3392e5678b 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -519,12 +519,12 @@ static int sja1105_init_avb_params(struct sja1105_private *priv)
 #define SJA1105_RATE_MBPS(speed) (((speed) * 64000) / 1000)
 
 static void sja1105_setup_policer(struct sja1105_l2_policing_entry *policing,
-				  int index)
+				  int index, int mtu)
 {
 	policing[index].sharindx = index;
 	policing[index].smax = 65535; /* Burst size in bytes */
 	policing[index].rate = SJA1105_RATE_MBPS(1000);
-	policing[index].maxlen = ETH_FRAME_LEN + VLAN_HLEN + ETH_FCS_LEN;
+	policing[index].maxlen = mtu;
 	policing[index].partition = 0;
 }
 
@@ -556,12 +556,16 @@ static int sja1105_init_l2_policing(struct sja1105_private *priv)
 	 */
 	for (i = 0, k = 0; i < SJA1105_NUM_PORTS; i++) {
 		int bcast = (SJA1105_NUM_PORTS * SJA1105_NUM_TC) + i;
+		int mtu = VLAN_ETH_FRAME_LEN + ETH_FCS_LEN;
+
+		if (dsa_is_cpu_port(priv->ds, i))
+			mtu += VLAN_HLEN;
 
 		for (j = 0; j < SJA1105_NUM_TC; j++, k++)
-			sja1105_setup_policer(policing, k);
+			sja1105_setup_policer(policing, k, mtu);
 
 		/* Set up this port's policer for broadcast traffic */
-		sja1105_setup_policer(policing, bcast);
+		sja1105_setup_policer(policing, bcast, mtu);
 	}
 	return 0;
 }
@@ -1544,6 +1548,7 @@ static const char * const sja1105_reset_reasons[] = {
 	[SJA1105_RX_HWTSTAMPING] = "RX timestamping",
 	[SJA1105_AGEING_TIME] = "Ageing time",
 	[SJA1105_SCHEDULING] = "Time-aware scheduling",
+	[SJA1105_BEST_EFFORT_POLICING] = "Best-effort policing",
 };
 
 /* For situations where we need to change a setting at runtime that is only
@@ -2120,6 +2125,39 @@ static int sja1105_set_ageing_time(struct dsa_switch *ds,
 	return sja1105_static_config_reload(priv, SJA1105_AGEING_TIME);
 }
 
+static int sja1105_change_mtu(struct dsa_switch *ds, int port, int new_mtu)
+{
+	int bcast = (SJA1105_NUM_PORTS * SJA1105_NUM_TC) + port;
+	struct sja1105_l2_policing_entry *policing;
+	struct sja1105_private *priv = ds->priv;
+	int tc;
+
+	new_mtu += VLAN_ETH_HLEN + ETH_FCS_LEN;
+
+	if (dsa_is_cpu_port(ds, port))
+		new_mtu += VLAN_HLEN;
+
+	policing = priv->static_config.tables[BLK_IDX_L2_POLICING].entries;
+
+	/* We set all 9 port policers to the same value, so just checking the
+	 * broadcast one is fine.
+	 */
+	if (policing[bcast].maxlen == new_mtu)
+		return 0;
+
+	for (tc = 0; tc < SJA1105_NUM_TC; tc++)
+		policing[port * SJA1105_NUM_TC + tc].maxlen = new_mtu;
+
+	policing[bcast].maxlen = new_mtu;
+
+	return sja1105_static_config_reload(priv, SJA1105_BEST_EFFORT_POLICING);
+}
+
+static int sja1105_get_max_mtu(struct dsa_switch *ds, int port)
+{
+	return 2043 - VLAN_ETH_HLEN - ETH_FCS_LEN;
+}
+
 static int sja1105_port_setup_tc(struct dsa_switch *ds, int port,
 				 enum tc_setup_type type,
 				 void *type_data)
@@ -2215,6 +2253,8 @@ static const struct dsa_switch_ops sja1105_switch_ops = {
 	.setup			= sja1105_setup,
 	.teardown		= sja1105_teardown,
 	.set_ageing_time	= sja1105_set_ageing_time,
+	.port_change_mtu	= sja1105_change_mtu,
+	.port_max_mtu		= sja1105_get_max_mtu,
 	.phylink_validate	= sja1105_phylink_validate,
 	.phylink_mac_link_state	= sja1105_mac_pcs_get_state,
 	.phylink_mac_config	= sja1105_mac_config,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 08/10] net: dsa: vsc73xx: Make the MTU configurable
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
                   ` (6 preceding siblings ...)
  2020-03-25 15:22 ` [PATCH v2 net-next 07/10] net: dsa: sja1105: Implement the port MTU callbacks Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 23:09   ` Florian Fainelli
  2020-03-25 15:22 ` [PATCH v2 net-next 09/10] net: dsa: felix: support changing the MTU Vladimir Oltean
  2020-03-25 15:22 ` [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath Vladimir Oltean
  9 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Instead of hardcoding the MTU to the maximum value allowed by the
hardware, obey the value known by the operating system.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/vitesse-vsc73xx-core.c | 30 +++++++++++++++++---------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/vitesse-vsc73xx-core.c b/drivers/net/dsa/vitesse-vsc73xx-core.c
index 6e21a2a5cf01..19ce4aa0973b 100644
--- a/drivers/net/dsa/vitesse-vsc73xx-core.c
+++ b/drivers/net/dsa/vitesse-vsc73xx-core.c
@@ -664,16 +664,6 @@ static void vsc73xx_init_port(struct vsc73xx *vsc, int port)
 		      VSC73XX_MAC_CFG_TX_EN |
 		      VSC73XX_MAC_CFG_RX_EN);
 
-	/* Max length, we can do up to 9.6 KiB, so allow that.
-	 * According to application not "VSC7398 Jumbo Frames" setting
-	 * up the MTU to 9.6 KB does not affect the performance on standard
-	 * frames, so just enable it. It is clear from the application note
-	 * that "9.6 kilobytes" == 9600 bytes.
-	 */
-	vsc73xx_write(vsc, VSC73XX_BLOCK_MAC,
-		      port,
-		      VSC73XX_MAXLEN, 9600);
-
 	/* Flow control for the CPU port:
 	 * Use a zero delay pause frame when pause condition is left
 	 * Obey pause control frames
@@ -1030,6 +1020,24 @@ static void vsc73xx_get_ethtool_stats(struct dsa_switch *ds, int port,
 	}
 }
 
+static int vsc73xx_change_mtu(struct dsa_switch *ds, int port, int new_mtu)
+{
+	struct vsc73xx *vsc = ds->priv;
+
+	return vsc73xx_write(vsc, VSC73XX_BLOCK_MAC, port,
+			     VSC73XX_MAXLEN, new_mtu);
+}
+
+/* According to application not "VSC7398 Jumbo Frames" setting
+ * up the MTU to 9.6 KB does not affect the performance on standard
+ * frames. It is clear from the application note that
+ * "9.6 kilobytes" == 9600 bytes.
+ */
+static int vsc73xx_get_max_mtu(struct dsa_switch *ds, int port)
+{
+	return 9600;
+}
+
 static const struct dsa_switch_ops vsc73xx_ds_ops = {
 	.get_tag_protocol = vsc73xx_get_tag_protocol,
 	.setup = vsc73xx_setup,
@@ -1041,6 +1049,8 @@ static const struct dsa_switch_ops vsc73xx_ds_ops = {
 	.get_sset_count = vsc73xx_get_sset_count,
 	.port_enable = vsc73xx_port_enable,
 	.port_disable = vsc73xx_port_disable,
+	.port_change_mtu = vsc73xx_change_mtu,
+	.port_max_mtu = vsc73xx_get_max_mtu,
 };
 
 static int vsc73xx_gpio_get(struct gpio_chip *chip, unsigned int offset)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 09/10] net: dsa: felix: support changing the MTU
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
                   ` (7 preceding siblings ...)
  2020-03-25 15:22 ` [PATCH v2 net-next 08/10] net: dsa: vsc73xx: Make the MTU configurable Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 23:10   ` Florian Fainelli
  2020-03-25 15:22 ` [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath Vladimir Oltean
  9 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Vladimir Oltean <vladimir.oltean@nxp.com>

Changing the MTU for this switch means altering the
DEV_GMII:MAC_CFG_STATUS:MAC_MAXLEN_CFG field MAX_LEN, which in turn
limits the size of frames that can be received.

Special accounting needs to be done for the DSA CPU port (NPI port in
hardware terms). The NPI port configuration needs to be held inside the
private ocelot structure, since it is now accessed from multiple places.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/ocelot/felix.c     | 18 ++++++++++++
 drivers/net/ethernet/mscc/ocelot.c | 45 +++++++++++++++++++++++-------
 include/soc/mscc/ocelot.h          |  7 +++++
 3 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 9f9efb974003..598f42f14525 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -610,6 +610,22 @@ static bool felix_txtstamp(struct dsa_switch *ds, int port,
 	return false;
 }
 
+static int felix_change_mtu(struct dsa_switch *ds, int port, int new_mtu)
+{
+	struct ocelot *ocelot = ds->priv;
+
+	ocelot_port_set_maxlen(ocelot, port, new_mtu);
+
+	return 0;
+}
+
+static int felix_get_max_mtu(struct dsa_switch *ds, int port)
+{
+	struct ocelot *ocelot = ds->priv;
+
+	return ocelot_get_max_mtu(ocelot, port);
+}
+
 static int felix_cls_flower_add(struct dsa_switch *ds, int port,
 				struct flow_cls_offload *cls, bool ingress)
 {
@@ -665,6 +681,8 @@ static const struct dsa_switch_ops felix_switch_ops = {
 	.port_hwtstamp_set	= felix_hwtstamp_set,
 	.port_rxtstamp		= felix_rxtstamp,
 	.port_txtstamp		= felix_txtstamp,
+	.port_change_mtu	= felix_change_mtu,
+	.port_max_mtu		= felix_get_max_mtu,
 	.cls_flower_add		= felix_cls_flower_add,
 	.cls_flower_del		= felix_cls_flower_del,
 	.cls_flower_stats	= felix_cls_flower_stats,
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 114d6053aa26..b5f925c1b5b2 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1998,13 +1998,25 @@ EXPORT_SYMBOL(ocelot_switchdev_blocking_nb);
 
 /* Configure the maximum SDU (L2 payload) on RX to the value specified in @sdu.
  * The length of VLAN tags is accounted for automatically via DEV_MAC_TAGS_CFG.
+ * In the special case that it's the NPI port that we're configuring, the
+ * length of the tag and optional prefix needs to be accounted for privately,
+ * in order to be able to sustain communication at the requested @sdu.
  */
-static void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu)
+void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu)
 {
 	struct ocelot_port *ocelot_port = ocelot->ports[port];
 	int maxlen = sdu + ETH_HLEN + ETH_FCS_LEN;
 	int atop_wm;
 
+	if (port == ocelot->npi) {
+		maxlen += OCELOT_TAG_LEN;
+
+		if (ocelot->inj_prefix == OCELOT_TAG_PREFIX_SHORT)
+			maxlen += OCELOT_SHORT_PREFIX_LEN;
+		else if (ocelot->inj_prefix == OCELOT_TAG_PREFIX_LONG)
+			maxlen += OCELOT_LONG_PREFIX_LEN;
+	}
+
 	ocelot_port_writel(ocelot_port, maxlen, DEV_MAC_MAXLEN_CFG);
 
 	/* Set Pause WM hysteresis
@@ -2022,6 +2034,24 @@ static void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu)
 			 SYS_ATOP, port);
 	ocelot_write(ocelot, ocelot_wm_enc(atop_wm), SYS_ATOP_TOT_CFG);
 }
+EXPORT_SYMBOL(ocelot_port_set_maxlen);
+
+int ocelot_get_max_mtu(struct ocelot *ocelot, int port)
+{
+	int max_mtu = 65535 - ETH_HLEN - ETH_FCS_LEN;
+
+	if (port == ocelot->npi) {
+		max_mtu -= OCELOT_TAG_LEN;
+
+		if (ocelot->inj_prefix == OCELOT_TAG_PREFIX_SHORT)
+			max_mtu -= OCELOT_SHORT_PREFIX_LEN;
+		else if (ocelot->inj_prefix == OCELOT_TAG_PREFIX_LONG)
+			max_mtu -= OCELOT_LONG_PREFIX_LEN;
+	}
+
+	return max_mtu;
+}
+EXPORT_SYMBOL(ocelot_get_max_mtu);
 
 void ocelot_init_port(struct ocelot *ocelot, int port)
 {
@@ -2131,6 +2161,10 @@ void ocelot_configure_cpu(struct ocelot *ocelot, int npi,
 {
 	int cpu = ocelot->num_phys_ports;
 
+	ocelot->npi = npi;
+	ocelot->inj_prefix = injection;
+	ocelot->xtr_prefix = extraction;
+
 	/* The unicast destination PGID for the CPU port module is unused */
 	ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, cpu);
 	/* Instead set up a multicast destination PGID for traffic copied to
@@ -2143,19 +2177,10 @@ void ocelot_configure_cpu(struct ocelot *ocelot, int npi,
 			 ANA_PORT_PORT_CFG, cpu);
 
 	if (npi >= 0 && npi < ocelot->num_phys_ports) {
-		int sdu = ETH_DATA_LEN + OCELOT_TAG_LEN;
-
 		ocelot_write(ocelot, QSYS_EXT_CPU_CFG_EXT_CPUQ_MSK_M |
 			     QSYS_EXT_CPU_CFG_EXT_CPU_PORT(npi),
 			     QSYS_EXT_CPU_CFG);
 
-		if (injection == OCELOT_TAG_PREFIX_SHORT)
-			sdu += OCELOT_SHORT_PREFIX_LEN;
-		else if (injection == OCELOT_TAG_PREFIX_LONG)
-			sdu += OCELOT_LONG_PREFIX_LEN;
-
-		ocelot_port_set_maxlen(ocelot, npi, sdu);
-
 		/* Enable NPI port */
 		ocelot_write_rix(ocelot,
 				 QSYS_SWITCH_PORT_MODE_INGRESS_DROP_MODE |
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index db2fb14bd775..23a78d927838 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -522,6 +522,11 @@ struct ocelot {
 	 */
 	u8				num_phys_ports;
 
+	int				npi;
+
+	enum ocelot_tag_prefix		inj_prefix;
+	enum ocelot_tag_prefix		xtr_prefix;
+
 	u32				*lags;
 
 	struct list_head		multicast;
@@ -616,6 +621,8 @@ int ocelot_hwstamp_set(struct ocelot *ocelot, int port, struct ifreq *ifr);
 int ocelot_port_add_txtstamp_skb(struct ocelot_port *ocelot_port,
 				 struct sk_buff *skb);
 void ocelot_get_txtstamp(struct ocelot *ocelot);
+void ocelot_port_set_maxlen(struct ocelot *ocelot, int port, size_t sdu);
+int ocelot_get_max_mtu(struct ocelot *ocelot, int port);
 int ocelot_cls_flower_replace(struct ocelot *ocelot, int port,
 			      struct flow_cls_offload *f, bool ingress);
 int ocelot_cls_flower_destroy(struct ocelot *ocelot, int port,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
                   ` (8 preceding siblings ...)
  2020-03-25 15:22 ` [PATCH v2 net-next 09/10] net: dsa: felix: support changing the MTU Vladimir Oltean
@ 2020-03-25 15:22 ` Vladimir Oltean
  2020-03-25 23:17   ` Florian Fainelli
  2020-03-26 10:17   ` Ido Schimmel
  9 siblings, 2 replies; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 15:22 UTC (permalink / raw)
  To: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

From: Vladimir Oltean <vladimir.oltean@nxp.com>

In the initial attempt to add MTU configuration for DSA:

https://patchwork.ozlabs.org/cover/1199868/

Florian raised a concern about the bridge MTU normalization logic (when
you bridge an interface with MTU 9000 and one with MTU 1500). His
expectation was that the bridge would automatically change the MTU of
all its slave ports to the minimum MTU, if those slaves are part of the
same hardware bridge. However, it doesn't do that, and for good reason,
I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
bridge net device itself, and not that of any slave port.  If it were to
modify the MTU of the slave ports, the effect would be that the user
wouldn't be able to increase the MTU of any bridge slave port as long as
it was part of the bridge, which would be a bit annoying to say the
least.

The idea behind this behavior is that normal termination from Linux over
the L2 forwarding domain described by DSA should happen over the bridge
net device, which _is_ properly limited by the minimum MTU. And
termination over individual slave device is possible even if those are
bridged. But that is not "forwarding", so there's no reason to do
normalization there, since only a single interface sees that packet.

The real problem is with the offloaded data path, where of course, the
bridge net device MTU is ignored. So a packet received on an interface
with MTU 9000 would still be forwarded to an interface with MTU 1500.
And that is exactly what this patch is trying to prevent from happening.

Florian's idea was that all hardware ports having the same
netdev_port_same_parent_id should be adjusted to have the same MTU.
The MTU that we attempt to configure the ports to is the most recently
modified MTU. The attempt is to follow user intention as closely as
possible and not be annoying at that.

So there are 2 cases really:

ip link set dev sw0p0 master br0
ip link set dev sw0p1 mtu 1400
ip link set dev sw0p1 master br0

The above sequence will make sw0p0 inherit MTU 1400 as well.

The second case:

ip link set dev sw0p0 master br0
ip link set dev sw0p1 master br0
ip link set dev sw0p0 mtu 1400

This sequence will make sw0p1 inherit MTU 1400 from sw0p0.

Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/bridge/br.c         |  1 +
 net/bridge/br_if.c      | 93 +++++++++++++++++++++++++++++++++++++++++
 net/bridge/br_private.h |  1 +
 3 files changed, 95 insertions(+)

diff --git a/net/bridge/br.c b/net/bridge/br.c
index b6fe30e3768f..5f05380df1ee 100644
--- a/net/bridge/br.c
+++ b/net/bridge/br.c
@@ -57,6 +57,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 
 	switch (event) {
 	case NETDEV_CHANGEMTU:
+		br_mtu_normalization(br, dev);
 		br_mtu_auto_adjust(br);
 		break;
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 4fe30b182ee7..a228668920a6 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -514,6 +514,98 @@ void br_mtu_auto_adjust(struct net_bridge *br)
 	br_opt_toggle(br, BROPT_MTU_SET_BY_USER, false);
 }
 
+struct br_hw_port {
+	struct list_head list;
+	struct net_device *dev;
+	int old_mtu;
+};
+
+static int br_hw_port_list_set_mtu(struct list_head *hw_port_list, int mtu)
+{
+	const struct br_hw_port *p;
+	int err;
+
+	list_for_each_entry(p, hw_port_list, list) {
+		if (p->dev->mtu == mtu)
+			continue;
+
+		err = dev_set_mtu(p->dev, mtu);
+		if (err)
+			goto rollback;
+	}
+
+	return 0;
+
+rollback:
+	list_for_each_entry_continue_reverse(p, hw_port_list, list) {
+		if (p->dev->mtu == p->old_mtu)
+			continue;
+
+		if (dev_set_mtu(p->dev, p->old_mtu))
+			netdev_err(p->dev, "Failed to restore MTU\n");
+	}
+
+	return err;
+}
+
+static void br_hw_port_list_free(struct list_head *hw_port_list)
+{
+	struct br_hw_port *p, *n;
+
+	list_for_each_entry_safe(p, n, hw_port_list, list)
+		kfree(p);
+}
+
+/* Make the hardware datapath to/from @br_if limited to a common MTU */
+void br_mtu_normalization(struct net_bridge *br, struct net_device *br_if)
+{
+	const struct net_bridge_port *p;
+	struct list_head hw_port_list;
+	int min_mtu = ETH_MAX_MTU;
+	int err;
+
+	INIT_LIST_HEAD(&hw_port_list);
+
+	/* Populate the list of ports that are part of the same hardware bridge
+	 * as the newly added port
+	 */
+	list_for_each_entry(p, &br->port_list, list) {
+		struct br_hw_port *hw_port;
+
+		if (!netdev_port_same_parent_id(p->dev, br_if))
+			continue;
+
+		if (min_mtu > p->dev->mtu)
+			min_mtu = p->dev->mtu;
+
+		hw_port = kzalloc(sizeof(*hw_port), GFP_KERNEL);
+		if (!hw_port)
+			goto out;
+
+		hw_port->dev = p->dev;
+		hw_port->old_mtu = p->dev->mtu;
+
+		list_add(&hw_port->list, &hw_port_list);
+	}
+
+	/* Attempt to configure the entire hardware bridge to the newly added
+	 * interface's MTU first, regardless of whether the intention of the
+	 * user was to raise or lower it.
+	 */
+	err = br_hw_port_list_set_mtu(&hw_port_list, br_if->mtu);
+	if (!err)
+		goto out;
+
+	/* Clearly that didn't work out so well, so just set the minimum MTU on
+	 * all hardware bridge ports now. If this fails too, then all ports will
+	 * still have their old MTU rolled back anyway.
+	 */
+	br_hw_port_list_set_mtu(&hw_port_list, min_mtu);
+
+out:
+	br_hw_port_list_free(&hw_port_list);
+}
+
 static void br_set_gso_limits(struct net_bridge *br)
 {
 	unsigned int gso_max_size = GSO_MAX_SIZE;
@@ -676,6 +768,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
 	if (changed_addr)
 		call_netdevice_notifiers(NETDEV_CHANGEADDR, br->dev);
 
+	br_mtu_normalization(br, dev);
 	br_mtu_auto_adjust(br);
 	br_set_gso_limits(br);
 
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 1f97703a52ff..df010e36228e 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -693,6 +693,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
 	      struct netlink_ext_ack *extack);
 int br_del_if(struct net_bridge *br, struct net_device *dev);
 void br_mtu_auto_adjust(struct net_bridge *br);
+void br_mtu_normalization(struct net_bridge *br, struct net_device *br_if);
 netdev_features_t br_features_recompute(struct net_bridge *br,
 					netdev_features_t features);
 void br_port_flags_change(struct net_bridge_port *port, unsigned long mask);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY
  2020-03-25 15:22 ` [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY Vladimir Oltean
@ 2020-03-25 15:44   ` Heiner Kallweit
  2020-03-25 22:45     ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Heiner Kallweit @ 2020-03-25 15:44 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, f.fainelli, vivien.didelot, davem,
	jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev

On 25.03.2020 16:22, Vladimir Oltean wrote:
> From: Murali Krishna Policharla <murali.policharla@broadcom.com>
> 
> Add API to configure jumbo frame settings in PHY during initial PHY
> configuration.
> 
> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  drivers/net/phy/bcm-phy-lib.c | 28 ++++++++++++++++++++++++++++
>  drivers/net/phy/bcm-phy-lib.h |  1 +
>  drivers/net/phy/bcm7xxx.c     |  4 ++++
>  include/linux/brcmphy.h       |  1 +
>  4 files changed, 34 insertions(+)
> 
> diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
> index e0d3310957ff..a26c80e13b43 100644
> --- a/drivers/net/phy/bcm-phy-lib.c
> +++ b/drivers/net/phy/bcm-phy-lib.c
> @@ -423,6 +423,34 @@ int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev)
>  }
>  EXPORT_SYMBOL_GPL(bcm_phy_28nm_a0b0_afe_config_init);
>  
> +int bcm_phy_enable_jumbo(struct phy_device *phydev)
> +{
> +	int val = 0, ret = 0;
> +
> +	ret = phy_write(phydev, MII_BCM54XX_AUX_CTL,
> +			MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
> +	if (ret < 0)
> +		return ret;
> +
> +	val = phy_read(phydev, MII_BCM54XX_AUX_CTL);
> +
> +	/* Enable extended length packet reception */
> +	val |= MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN;
> +	ret = phy_write(phydev, MII_BCM54XX_AUX_CTL, val);
> +

There are different helpers already in bcm-phy-lib,
e.g. bcm54xx_auxctl_read. Also bcm_phy_write_misc()
has has quite something in common with your new function.
It would be good if a helper could be used here.

> +	if (ret < 0)
> +		return ret;
> +
> +	val = phy_read(phydev, MII_BCM54XX_ECR);
> +
> +	/* Enable 10K byte packet length reception */
> +	val |= BIT(0);
> +	ret =  phy_write(phydev, MII_BCM54XX_ECR, val);
> +

Why not use phy_set_bits() ?

> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(bcm_phy_enable_jumbo);
> +
>  MODULE_DESCRIPTION("Broadcom PHY Library");
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Broadcom Corporation");
> diff --git a/drivers/net/phy/bcm-phy-lib.h b/drivers/net/phy/bcm-phy-lib.h
> index c86fb9d1240c..129df819be8c 100644
> --- a/drivers/net/phy/bcm-phy-lib.h
> +++ b/drivers/net/phy/bcm-phy-lib.h
> @@ -65,5 +65,6 @@ void bcm_phy_get_stats(struct phy_device *phydev, u64 *shadow,
>  		       struct ethtool_stats *stats, u64 *data);
>  void bcm_phy_r_rc_cal_reset(struct phy_device *phydev);
>  int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev);
> +int bcm_phy_enable_jumbo(struct phy_device *phydev);
>  
>  #endif /* _LINUX_BCM_PHY_LIB_H */
> diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
> index af8eabe7a6d4..692048d86ab1 100644
> --- a/drivers/net/phy/bcm7xxx.c
> +++ b/drivers/net/phy/bcm7xxx.c
> @@ -178,6 +178,10 @@ static int bcm7xxx_28nm_config_init(struct phy_device *phydev)
>  		break;
>  	}
>  
> +	if (ret)
> +		return ret;
> +
> +	ret =  bcm_phy_enable_jumbo(phydev);
>  	if (ret)
>  		return ret;
>  
> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
> index b475e7f20d28..19bd86019e93 100644
> --- a/include/linux/brcmphy.h
> +++ b/include/linux/brcmphy.h
> @@ -119,6 +119,7 @@
>  #define MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL	0x00
>  #define MII_BCM54XX_AUXCTL_ACTL_TX_6DB		0x0400
>  #define MII_BCM54XX_AUXCTL_ACTL_SMDSP_ENA	0x0800
> +#define MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN	0x4000
>  
>  #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC			0x07
>  #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN	0x0010
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY
  2020-03-25 15:44   ` Heiner Kallweit
@ 2020-03-25 22:45     ` Vladimir Oltean
  2020-03-25 23:02       ` Florian Fainelli
  2020-03-25 23:21       ` Heiner Kallweit
  0 siblings, 2 replies; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-25 22:45 UTC (permalink / raw)
  To: Heiner Kallweit
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Ido Schimmel, Jakub Kicinski, Nikolay Aleksandrov, netdev

On Wed, 25 Mar 2020 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 25.03.2020 16:22, Vladimir Oltean wrote:
> > From: Murali Krishna Policharla <murali.policharla@broadcom.com>
> >
> > Add API to configure jumbo frame settings in PHY during initial PHY
> > configuration.
> >
> > Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
> > Reviewed-by: Scott Branden <scott.branden@broadcom.com>
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> >  drivers/net/phy/bcm-phy-lib.c | 28 ++++++++++++++++++++++++++++
> >  drivers/net/phy/bcm-phy-lib.h |  1 +
> >  drivers/net/phy/bcm7xxx.c     |  4 ++++
> >  include/linux/brcmphy.h       |  1 +
> >  4 files changed, 34 insertions(+)
> >
> > diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
> > index e0d3310957ff..a26c80e13b43 100644
> > --- a/drivers/net/phy/bcm-phy-lib.c
> > +++ b/drivers/net/phy/bcm-phy-lib.c
> > @@ -423,6 +423,34 @@ int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev)
> >  }
> >  EXPORT_SYMBOL_GPL(bcm_phy_28nm_a0b0_afe_config_init);
> >
> > +int bcm_phy_enable_jumbo(struct phy_device *phydev)
> > +{
> > +     int val = 0, ret = 0;
> > +
> > +     ret = phy_write(phydev, MII_BCM54XX_AUX_CTL,
> > +                     MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     val = phy_read(phydev, MII_BCM54XX_AUX_CTL);
> > +
> > +     /* Enable extended length packet reception */
> > +     val |= MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN;
> > +     ret = phy_write(phydev, MII_BCM54XX_AUX_CTL, val);
> > +
>
> There are different helpers already in bcm-phy-lib,
> e.g. bcm54xx_auxctl_read. Also bcm_phy_write_misc()
> has has quite something in common with your new function.
> It would be good if a helper could be used here.
>

Thanks Heiner.
I'm not quite sure the operation is performed correctly though? My
books are telling me that the "Receive Extended Packet Length" field
is accessible via the Auxiliary Control Register 0x18 when the shadow
value is 000, not 111 as this patch is doing. At least for BCM54xxx in
terms of which the macros are defined. Am I wrong?

> > +     if (ret < 0)
> > +             return ret;
> > +
> > +     val = phy_read(phydev, MII_BCM54XX_ECR);
> > +
> > +     /* Enable 10K byte packet length reception */
> > +     val |= BIT(0);
> > +     ret =  phy_write(phydev, MII_BCM54XX_ECR, val);
> > +
>
> Why not use phy_set_bits() ?
>

Well, the reason is that I didn't write the patch. I'll simplify it.

> > +     return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(bcm_phy_enable_jumbo);
> > +
> >  MODULE_DESCRIPTION("Broadcom PHY Library");
> >  MODULE_LICENSE("GPL v2");
> >  MODULE_AUTHOR("Broadcom Corporation");
> > diff --git a/drivers/net/phy/bcm-phy-lib.h b/drivers/net/phy/bcm-phy-lib.h
> > index c86fb9d1240c..129df819be8c 100644
> > --- a/drivers/net/phy/bcm-phy-lib.h
> > +++ b/drivers/net/phy/bcm-phy-lib.h
> > @@ -65,5 +65,6 @@ void bcm_phy_get_stats(struct phy_device *phydev, u64 *shadow,
> >                      struct ethtool_stats *stats, u64 *data);
> >  void bcm_phy_r_rc_cal_reset(struct phy_device *phydev);
> >  int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev);
> > +int bcm_phy_enable_jumbo(struct phy_device *phydev);
> >
> >  #endif /* _LINUX_BCM_PHY_LIB_H */
> > diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
> > index af8eabe7a6d4..692048d86ab1 100644
> > --- a/drivers/net/phy/bcm7xxx.c
> > +++ b/drivers/net/phy/bcm7xxx.c
> > @@ -178,6 +178,10 @@ static int bcm7xxx_28nm_config_init(struct phy_device *phydev)
> >               break;
> >       }
> >
> > +     if (ret)
> > +             return ret;
> > +
> > +     ret =  bcm_phy_enable_jumbo(phydev);
> >       if (ret)
> >               return ret;
> >
> > diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
> > index b475e7f20d28..19bd86019e93 100644
> > --- a/include/linux/brcmphy.h
> > +++ b/include/linux/brcmphy.h
> > @@ -119,6 +119,7 @@
> >  #define MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL    0x00
> >  #define MII_BCM54XX_AUXCTL_ACTL_TX_6DB               0x0400
> >  #define MII_BCM54XX_AUXCTL_ACTL_SMDSP_ENA    0x0800
> > +#define MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN  0x4000
> >
> >  #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC                      0x07
> >  #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN 0x0010
> >
>

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY
  2020-03-25 22:45     ` Vladimir Oltean
@ 2020-03-25 23:02       ` Florian Fainelli
  2020-03-25 23:21       ` Heiner Kallweit
  1 sibling, 0 replies; 37+ messages in thread
From: Florian Fainelli @ 2020-03-25 23:02 UTC (permalink / raw)
  To: Vladimir Oltean, Heiner Kallweit, murali.policharla
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Jakub Kicinski,
	Stephen Hemminger, Jiri Pirko, Ido Schimmel, Jakub Kicinski,
	Nikolay Aleksandrov, netdev



On 3/25/2020 3:45 PM, Vladimir Oltean wrote:
> On Wed, 25 Mar 2020 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 25.03.2020 16:22, Vladimir Oltean wrote:
>>> From: Murali Krishna Policharla <murali.policharla@broadcom.com>
>>>
>>> Add API to configure jumbo frame settings in PHY during initial PHY
>>> configuration.
>>>
>>> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>>> ---
>>>  drivers/net/phy/bcm-phy-lib.c | 28 ++++++++++++++++++++++++++++
>>>  drivers/net/phy/bcm-phy-lib.h |  1 +
>>>  drivers/net/phy/bcm7xxx.c     |  4 ++++
>>>  include/linux/brcmphy.h       |  1 +
>>>  4 files changed, 34 insertions(+)
>>>
>>> diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
>>> index e0d3310957ff..a26c80e13b43 100644
>>> --- a/drivers/net/phy/bcm-phy-lib.c
>>> +++ b/drivers/net/phy/bcm-phy-lib.c
>>> @@ -423,6 +423,34 @@ int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev)
>>>  }
>>>  EXPORT_SYMBOL_GPL(bcm_phy_28nm_a0b0_afe_config_init);
>>>
>>> +int bcm_phy_enable_jumbo(struct phy_device *phydev)
>>> +{
>>> +     int val = 0, ret = 0;
>>> +
>>> +     ret = phy_write(phydev, MII_BCM54XX_AUX_CTL,
>>> +                     MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     val = phy_read(phydev, MII_BCM54XX_AUX_CTL);
>>> +
>>> +     /* Enable extended length packet reception */
>>> +     val |= MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN;
>>> +     ret = phy_write(phydev, MII_BCM54XX_AUX_CTL, val);
>>> +
>>
>> There are different helpers already in bcm-phy-lib,
>> e.g. bcm54xx_auxctl_read. Also bcm_phy_write_misc()
>> has has quite something in common with your new function.
>> It would be good if a helper could be used here.
>>
> 
> Thanks Heiner.
> I'm not quite sure the operation is performed correctly though? My
> books are telling me that the "Receive Extended Packet Length" field
> is accessible via the Auxiliary Control Register 0x18 when the shadow
> value is 000, not 111 as this patch is doing. At least for BCM54xxx in
> terms of which the macros are defined. Am I wrong?

I can confirm that for the BCM54810 PHY as well, the extended packet
length is in register 0x18 shadow 0b000.

Murali, which datsheet did you use as a reference? The internal document
for the 28nm EGPHY which is what this driver is supposed to drive is
also indicating the same thing.
-- 
Florian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 05/10] bgmac: Add DMA support to handle frames beyond 8192 bytes
  2020-03-25 15:22 ` [PATCH v2 net-next 05/10] bgmac: Add DMA support to handle frames beyond 8192 bytes Vladimir Oltean
@ 2020-03-25 23:07   ` Florian Fainelli
  0 siblings, 0 replies; 37+ messages in thread
From: Florian Fainelli @ 2020-03-25 23:07 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev



On 3/25/2020 8:22 AM, Vladimir Oltean wrote:
> From: Murali Krishna Policharla <murali.policharla@broadcom.com>
> 
> Add DMA support in driver to handle jumbo frames beyond 8192 bytes.
> Also update jumbo frame max size to include FCS, the DMA packet length
> received includes FCS.
> 
> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
> Reviewed-by: Arun Parameswaran <arun.parameswaran@broadcom.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

This patch and the previous one should probably be squashed together. I
will try to get this tested on a Northstar platform.
-- 
Florian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 07/10] net: dsa: sja1105: Implement the port MTU callbacks
  2020-03-25 15:22 ` [PATCH v2 net-next 07/10] net: dsa: sja1105: Implement the port MTU callbacks Vladimir Oltean
@ 2020-03-25 23:08   ` Florian Fainelli
  0 siblings, 0 replies; 37+ messages in thread
From: Florian Fainelli @ 2020-03-25 23:08 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev



On 3/25/2020 8:22 AM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> On this switch, the frame length enforcements are performed by the
> ingress policers. There are 2 types of those: regular L2 (also called
> best-effort) and Virtual Link policers (an ARINC664/AFDX concept for
> defining L2 streams with certain QoS abilities). To avoid future
> confusion, I prefer to call the reset reason "Best-effort policers",
> even though the VL policers are not yet supported.
> 
> We also need to change the setup of the initial static config, such that
> DSA calls to .change_mtu (which are expensive) become no-ops and don't
> reset the switch 5 times.
> 
> A driver-level decision is to unconditionally allow single VLAN-tagged
> traffic on all ports. The CPU port must accept an additional VLAN header
> for the DSA tag, which is again a driver-level decision.
> 
> The policers actually count bytes not only from the SDU, but also from
> the Ethernet header and FCS, so those need to be accounted for as well.
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 08/10] net: dsa: vsc73xx: Make the MTU configurable
  2020-03-25 15:22 ` [PATCH v2 net-next 08/10] net: dsa: vsc73xx: Make the MTU configurable Vladimir Oltean
@ 2020-03-25 23:09   ` Florian Fainelli
  0 siblings, 0 replies; 37+ messages in thread
From: Florian Fainelli @ 2020-03-25 23:09 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev



On 3/25/2020 8:22 AM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Instead of hardcoding the MTU to the maximum value allowed by the
> hardware, obey the value known by the operating system.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 09/10] net: dsa: felix: support changing the MTU
  2020-03-25 15:22 ` [PATCH v2 net-next 09/10] net: dsa: felix: support changing the MTU Vladimir Oltean
@ 2020-03-25 23:10   ` Florian Fainelli
  0 siblings, 0 replies; 37+ messages in thread
From: Florian Fainelli @ 2020-03-25 23:10 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev



On 3/25/2020 8:22 AM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> Changing the MTU for this switch means altering the
> DEV_GMII:MAC_CFG_STATUS:MAC_MAXLEN_CFG field MAX_LEN, which in turn
> limits the size of frames that can be received.
> 
> Special accounting needs to be done for the DSA CPU port (NPI port in
> hardware terms). The NPI port configuration needs to be held inside the
> private ocelot structure, since it is now accessed from multiple places.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-25 15:22 ` [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath Vladimir Oltean
@ 2020-03-25 23:17   ` Florian Fainelli
  2020-03-26  0:30     ` Vladimir Oltean
  2020-03-26 10:17   ` Ido Schimmel
  1 sibling, 1 reply; 37+ messages in thread
From: Florian Fainelli @ 2020-03-25 23:17 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev



On 3/25/2020 8:22 AM, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> In the initial attempt to add MTU configuration for DSA:
> 
> https://patchwork.ozlabs.org/cover/1199868/
> 
> Florian raised a concern about the bridge MTU normalization logic (when
> you bridge an interface with MTU 9000 and one with MTU 1500). His
> expectation was that the bridge would automatically change the MTU of
> all its slave ports to the minimum MTU, if those slaves are part of the
> same hardware bridge. However, it doesn't do that, and for good reason,
> I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> bridge net device itself, and not that of any slave port.  If it were to
> modify the MTU of the slave ports, the effect would be that the user
> wouldn't be able to increase the MTU of any bridge slave port as long as
> it was part of the bridge, which would be a bit annoying to say the
> least.
> 
> The idea behind this behavior is that normal termination from Linux over
> the L2 forwarding domain described by DSA should happen over the bridge
> net device, which _is_ properly limited by the minimum MTU. And
> termination over individual slave device is possible even if those are
> bridged. But that is not "forwarding", so there's no reason to do
> normalization there, since only a single interface sees that packet.
> 
> The real problem is with the offloaded data path, where of course, the
> bridge net device MTU is ignored. So a packet received on an interface
> with MTU 9000 would still be forwarded to an interface with MTU 1500.
> And that is exactly what this patch is trying to prevent from happening.
> 
> Florian's idea was that all hardware ports having the same
> netdev_port_same_parent_id should be adjusted to have the same MTU.
> The MTU that we attempt to configure the ports to is the most recently
> modified MTU. The attempt is to follow user intention as closely as
> possible and not be annoying at that.
> 
> So there are 2 cases really:
> 
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 mtu 1400
> ip link set dev sw0p1 master br0
> 
> The above sequence will make sw0p0 inherit MTU 1400 as well.
> 
> The second case:
> 
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 master br0
> ip link set dev sw0p0 mtu 1400
> 
> This sequence will make sw0p1 inherit MTU 1400 from sw0p0.
> 
> Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/bridge/br.c         |  1 +
>  net/bridge/br_if.c      | 93 +++++++++++++++++++++++++++++++++++++++++
>  net/bridge/br_private.h |  1 +
>  3 files changed, 95 insertions(+)
> 
> diff --git a/net/bridge/br.c b/net/bridge/br.c
> index b6fe30e3768f..5f05380df1ee 100644
> --- a/net/bridge/br.c
> +++ b/net/bridge/br.c
> @@ -57,6 +57,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
>  
>  	switch (event) {
>  	case NETDEV_CHANGEMTU:
> +		br_mtu_normalization(br, dev);

I do not remember if you are allowed to sleep in a netdevice notifier, I
believe not, so you may need to pass a gfp_t to br_mtu_normalization for
allocations to be GFP_ATOMIC when called from that context, and
GFP_KERNEL from br_add_if().

It would be nice if we could avoid doing these allocations when called
from the netdev notifier though, could we just keep the information
around since the br_hw_port follows the same lifetime as the
net_bridge_port structure. Other than that, this looks good to me, thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY
  2020-03-25 22:45     ` Vladimir Oltean
  2020-03-25 23:02       ` Florian Fainelli
@ 2020-03-25 23:21       ` Heiner Kallweit
  1 sibling, 0 replies; 37+ messages in thread
From: Heiner Kallweit @ 2020-03-25 23:21 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Ido Schimmel, Jakub Kicinski, Nikolay Aleksandrov, netdev

On 25.03.2020 23:45, Vladimir Oltean wrote:
> On Wed, 25 Mar 2020 at 17:44, Heiner Kallweit <hkallweit1@gmail.com> wrote:
>>
>> On 25.03.2020 16:22, Vladimir Oltean wrote:
>>> From: Murali Krishna Policharla <murali.policharla@broadcom.com>
>>>
>>> Add API to configure jumbo frame settings in PHY during initial PHY
>>> configuration.
>>>
>>> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
>>> Reviewed-by: Scott Branden <scott.branden@broadcom.com>
>>> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>>> ---
>>>  drivers/net/phy/bcm-phy-lib.c | 28 ++++++++++++++++++++++++++++
>>>  drivers/net/phy/bcm-phy-lib.h |  1 +
>>>  drivers/net/phy/bcm7xxx.c     |  4 ++++
>>>  include/linux/brcmphy.h       |  1 +
>>>  4 files changed, 34 insertions(+)
>>>
>>> diff --git a/drivers/net/phy/bcm-phy-lib.c b/drivers/net/phy/bcm-phy-lib.c
>>> index e0d3310957ff..a26c80e13b43 100644
>>> --- a/drivers/net/phy/bcm-phy-lib.c
>>> +++ b/drivers/net/phy/bcm-phy-lib.c
>>> @@ -423,6 +423,34 @@ int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev)
>>>  }
>>>  EXPORT_SYMBOL_GPL(bcm_phy_28nm_a0b0_afe_config_init);
>>>
>>> +int bcm_phy_enable_jumbo(struct phy_device *phydev)
>>> +{
>>> +     int val = 0, ret = 0;
>>> +
>>> +     ret = phy_write(phydev, MII_BCM54XX_AUX_CTL,
>>> +                     MII_BCM54XX_AUXCTL_SHDWSEL_MISC);
>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     val = phy_read(phydev, MII_BCM54XX_AUX_CTL);
>>> +
>>> +     /* Enable extended length packet reception */
>>> +     val |= MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN;
>>> +     ret = phy_write(phydev, MII_BCM54XX_AUX_CTL, val);
>>> +
>>
>> There are different helpers already in bcm-phy-lib,
>> e.g. bcm54xx_auxctl_read. Also bcm_phy_write_misc()
>> has has quite something in common with your new function.
>> It would be good if a helper could be used here.
>>
> 
> Thanks Heiner.
> I'm not quite sure the operation is performed correctly though? My
> books are telling me that the "Receive Extended Packet Length" field
> is accessible via the Auxiliary Control Register 0x18 when the shadow
> value is 000, not 111 as this patch is doing. At least for BCM54xxx in
> terms of which the macros are defined. Am I wrong?
> 

I didn't check the datasheet, so I can't really comment on whether the
patch does what it is supposed to do. My point was that the following
pattern occurs several times in the driver:
phy_write(phydev, MII_BCM54XX_AUX_CTL, VAL_1);
phy_modify(phydev, MII_BCM54XX_AUX_CTL, VAL_2, VAL_3)
(or phy_clear_bits/phy_set_bits instead of phy_modify)

Therefore a generic helper could make sense, or it could at least be
written as such a two-liner always.

>>> +     if (ret < 0)
>>> +             return ret;
>>> +
>>> +     val = phy_read(phydev, MII_BCM54XX_ECR);
>>> +
>>> +     /* Enable 10K byte packet length reception */
>>> +     val |= BIT(0);
>>> +     ret =  phy_write(phydev, MII_BCM54XX_ECR, val);
>>> +
>>
>> Why not use phy_set_bits() ?
>>
> 
> Well, the reason is that I didn't write the patch. I'll simplify it.
> 
>>> +     return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(bcm_phy_enable_jumbo);
>>> +
>>>  MODULE_DESCRIPTION("Broadcom PHY Library");
>>>  MODULE_LICENSE("GPL v2");
>>>  MODULE_AUTHOR("Broadcom Corporation");
>>> diff --git a/drivers/net/phy/bcm-phy-lib.h b/drivers/net/phy/bcm-phy-lib.h
>>> index c86fb9d1240c..129df819be8c 100644
>>> --- a/drivers/net/phy/bcm-phy-lib.h
>>> +++ b/drivers/net/phy/bcm-phy-lib.h
>>> @@ -65,5 +65,6 @@ void bcm_phy_get_stats(struct phy_device *phydev, u64 *shadow,
>>>                      struct ethtool_stats *stats, u64 *data);
>>>  void bcm_phy_r_rc_cal_reset(struct phy_device *phydev);
>>>  int bcm_phy_28nm_a0b0_afe_config_init(struct phy_device *phydev);
>>> +int bcm_phy_enable_jumbo(struct phy_device *phydev);
>>>
>>>  #endif /* _LINUX_BCM_PHY_LIB_H */
>>> diff --git a/drivers/net/phy/bcm7xxx.c b/drivers/net/phy/bcm7xxx.c
>>> index af8eabe7a6d4..692048d86ab1 100644
>>> --- a/drivers/net/phy/bcm7xxx.c
>>> +++ b/drivers/net/phy/bcm7xxx.c
>>> @@ -178,6 +178,10 @@ static int bcm7xxx_28nm_config_init(struct phy_device *phydev)
>>>               break;
>>>       }
>>>
>>> +     if (ret)
>>> +             return ret;
>>> +
>>> +     ret =  bcm_phy_enable_jumbo(phydev);
>>>       if (ret)
>>>               return ret;
>>>
>>> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
>>> index b475e7f20d28..19bd86019e93 100644
>>> --- a/include/linux/brcmphy.h
>>> +++ b/include/linux/brcmphy.h
>>> @@ -119,6 +119,7 @@
>>>  #define MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL    0x00
>>>  #define MII_BCM54XX_AUXCTL_ACTL_TX_6DB               0x0400
>>>  #define MII_BCM54XX_AUXCTL_ACTL_SMDSP_ENA    0x0800
>>> +#define MII_BCM54XX_AUXCTL_ACTL_EXT_PKT_LEN  0x4000
>>>
>>>  #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC                      0x07
>>>  #define MII_BCM54XX_AUXCTL_SHDWSEL_MISC_WIRESPEED_EN 0x0010
>>>
>>
> 
> Regards,
> -Vladimir
> .
> 


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support
  2020-03-25 15:22 ` [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support Vladimir Oltean
@ 2020-03-25 23:21   ` Florian Fainelli
  2020-03-26  0:48     ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Florian Fainelli @ 2020-03-25 23:21 UTC (permalink / raw)
  To: Vladimir Oltean, andrew, vivien.didelot, davem, jakub.kicinski
  Cc: murali.policharla, stephen, jiri, idosch, kuba, nikolay, netdev



On 3/25/2020 8:22 AM, Vladimir Oltean wrote:
> From: Murali Krishna Policharla <murali.policharla@broadcom.com>
> 
> Add b53_change_mtu API to configure mtu settings in B53 switch. Enable
> jumbo frame support if configured mtu size is for jumbo frame. Also
> configure BCM583XX devices to send and receive jumbo frames when ports
> are configured with 10/100 Mbps speed.
> 
> Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

[snip]

> @@ -658,6 +659,14 @@ static void b53_enable_cpu_port(struct b53_device *dev, int port)
>  	b53_brcm_hdr_setup(dev->ds, port);
>  
>  	b53_br_egress_floods(dev->ds, port, true, true);
> +
> +	b53_read32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, &port_mask);
> +
> +	if (dev->chip_id == BCM583XX_DEVICE_ID)
> +		port_mask |= JPM_10_100_JUMBO_EN;
> +
> +	port_mask |= BIT(port);
> +	b53_write32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, port_mask);

This should eventually be brought into b53_set_jumbo() where we already
have existing logic to configure whether to accept jumbo frames and for
10/100M ports, too, not strictly necessary for now though:

Acked-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-25 23:17   ` Florian Fainelli
@ 2020-03-26  0:30     ` Vladimir Oltean
  0 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-26  0:30 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Jakub Kicinski,
	murali.policharla, Stephen Hemminger, Jiri Pirko, Ido Schimmel,
	Jakub Kicinski, Nikolay Aleksandrov, netdev

On Thu, 26 Mar 2020 at 01:18, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 3/25/2020 8:22 AM, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > In the initial attempt to add MTU configuration for DSA:
> >
> > https://patchwork.ozlabs.org/cover/1199868/
> >
> > Florian raised a concern about the bridge MTU normalization logic (when
> > you bridge an interface with MTU 9000 and one with MTU 1500). His
> > expectation was that the bridge would automatically change the MTU of
> > all its slave ports to the minimum MTU, if those slaves are part of the
> > same hardware bridge. However, it doesn't do that, and for good reason,
> > I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> > bridge net device itself, and not that of any slave port.  If it were to
> > modify the MTU of the slave ports, the effect would be that the user
> > wouldn't be able to increase the MTU of any bridge slave port as long as
> > it was part of the bridge, which would be a bit annoying to say the
> > least.
> >
> > The idea behind this behavior is that normal termination from Linux over
> > the L2 forwarding domain described by DSA should happen over the bridge
> > net device, which _is_ properly limited by the minimum MTU. And
> > termination over individual slave device is possible even if those are
> > bridged. But that is not "forwarding", so there's no reason to do
> > normalization there, since only a single interface sees that packet.
> >
> > The real problem is with the offloaded data path, where of course, the
> > bridge net device MTU is ignored. So a packet received on an interface
> > with MTU 9000 would still be forwarded to an interface with MTU 1500.
> > And that is exactly what this patch is trying to prevent from happening.
> >
> > Florian's idea was that all hardware ports having the same
> > netdev_port_same_parent_id should be adjusted to have the same MTU.
> > The MTU that we attempt to configure the ports to is the most recently
> > modified MTU. The attempt is to follow user intention as closely as
> > possible and not be annoying at that.
> >
> > So there are 2 cases really:
> >
> > ip link set dev sw0p0 master br0
> > ip link set dev sw0p1 mtu 1400
> > ip link set dev sw0p1 master br0
> >
> > The above sequence will make sw0p0 inherit MTU 1400 as well.
> >
> > The second case:
> >
> > ip link set dev sw0p0 master br0
> > ip link set dev sw0p1 master br0
> > ip link set dev sw0p0 mtu 1400
> >
> > This sequence will make sw0p1 inherit MTU 1400 from sw0p0.
> >
> > Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> >  net/bridge/br.c         |  1 +
> >  net/bridge/br_if.c      | 93 +++++++++++++++++++++++++++++++++++++++++
> >  net/bridge/br_private.h |  1 +
> >  3 files changed, 95 insertions(+)
> >
> > diff --git a/net/bridge/br.c b/net/bridge/br.c
> > index b6fe30e3768f..5f05380df1ee 100644
> > --- a/net/bridge/br.c
> > +++ b/net/bridge/br.c
> > @@ -57,6 +57,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
> >
> >       switch (event) {
> >       case NETDEV_CHANGEMTU:
> > +             br_mtu_normalization(br, dev);
>
> I do not remember if you are allowed to sleep in a netdevice notifier, I
> believe not, so you may need to pass a gfp_t to br_mtu_normalization for
> allocations to be GFP_ATOMIC when called from that context, and
> GFP_KERNEL from br_add_if().
>

Not only can you sleep, but the RTNL is also held. It's a bliss!

> It would be nice if we could avoid doing these allocations when called
> from the netdev notifier though, could we just keep the information
> around since the br_hw_port follows the same lifetime as the
> net_bridge_port structure. Other than that, this looks good to me, thanks!

So does this comment still apply then?
I wanted to be as self-contained as possible, it's a pain to keep the
old_mtu list for rollback as the code probably shows. I wouldn't go as
far as to add more stuff into struct net_bridge_port for the sole
purpose of passing data around in this function.

> --
> Florian

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support
  2020-03-25 23:21   ` Florian Fainelli
@ 2020-03-26  0:48     ` Vladimir Oltean
  0 siblings, 0 replies; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-26  0:48 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, Vivien Didelot, David S. Miller, Jakub Kicinski,
	murali.policharla, Stephen Hemminger, Jiri Pirko, Ido Schimmel,
	Jakub Kicinski, Nikolay Aleksandrov, netdev

On Thu, 26 Mar 2020 at 01:22, Florian Fainelli <f.fainelli@gmail.com> wrote:
>
>
>
> On 3/25/2020 8:22 AM, Vladimir Oltean wrote:
> > From: Murali Krishna Policharla <murali.policharla@broadcom.com>
> >
> > Add b53_change_mtu API to configure mtu settings in B53 switch. Enable
> > jumbo frame support if configured mtu size is for jumbo frame. Also
> > configure BCM583XX devices to send and receive jumbo frames when ports
> > are configured with 10/100 Mbps speed.
> >
> > Signed-off-by: Murali Krishna Policharla <murali.policharla@broadcom.com>
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
>
> [snip]
>
> > @@ -658,6 +659,14 @@ static void b53_enable_cpu_port(struct b53_device *dev, int port)
> >       b53_brcm_hdr_setup(dev->ds, port);
> >
> >       b53_br_egress_floods(dev->ds, port, true, true);
> > +
> > +     b53_read32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, &port_mask);
> > +
> > +     if (dev->chip_id == BCM583XX_DEVICE_ID)
> > +             port_mask |= JPM_10_100_JUMBO_EN;
> > +
> > +     port_mask |= BIT(port);
> > +     b53_write32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, port_mask);
>
> This should eventually be brought into b53_set_jumbo() where we already
> have existing logic to configure whether to accept jumbo frames and for
> 10/100M ports, too, not strictly necessary for now though:
>
> Acked-by: Florian Fainelli <f.fainelli@gmail.com>
> --
> Florian

What do you mean should be done? This?

     if (!is5325(dev) && !is5365(dev))
-        b53_set_jumbo(dev, dev->enable_jumbo, false);
+        b53_set_jumbo(dev, dev->enable_jumbo, is58xx(dev));

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-25 15:22 ` [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath Vladimir Oltean
  2020-03-25 23:17   ` Florian Fainelli
@ 2020-03-26 10:17   ` Ido Schimmel
  2020-03-26 10:25     ` Vladimir Oltean
  1 sibling, 1 reply; 37+ messages in thread
From: Ido Schimmel @ 2020-03-26 10:17 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: andrew, f.fainelli, vivien.didelot, davem, jakub.kicinski,
	murali.policharla, stephen, jiri, kuba, nikolay, netdev

Hi Vladimir,

On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> In the initial attempt to add MTU configuration for DSA:
> 
> https://patchwork.ozlabs.org/cover/1199868/
> 
> Florian raised a concern about the bridge MTU normalization logic (when
> you bridge an interface with MTU 9000 and one with MTU 1500). His
> expectation was that the bridge would automatically change the MTU of
> all its slave ports to the minimum MTU, if those slaves are part of the
> same hardware bridge. However, it doesn't do that, and for good reason,
> I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> bridge net device itself, and not that of any slave port.  If it were to
> modify the MTU of the slave ports, the effect would be that the user
> wouldn't be able to increase the MTU of any bridge slave port as long as
> it was part of the bridge, which would be a bit annoying to say the
> least.
> 
> The idea behind this behavior is that normal termination from Linux over
> the L2 forwarding domain described by DSA should happen over the bridge
> net device, which _is_ properly limited by the minimum MTU. And
> termination over individual slave device is possible even if those are
> bridged. But that is not "forwarding", so there's no reason to do
> normalization there, since only a single interface sees that packet.
> 
> The real problem is with the offloaded data path, where of course, the
> bridge net device MTU is ignored. So a packet received on an interface
> with MTU 9000 would still be forwarded to an interface with MTU 1500.
> And that is exactly what this patch is trying to prevent from happening.

How is that different from the software data path where the CPU needs to
forward the packet between port A with MTU X and port B with MTU X/2 ?

I don't really understand what problem you are trying to solve here. It
seems like the user did some misconfiguration and now you're introducing
a policy to mitigate it? If so, it should be something the user can
disable. It also seems like something that can be easily handled by a
user space application. You get netlink notifications for all these
operations.

> 
> Florian's idea was that all hardware ports having the same
> netdev_port_same_parent_id should be adjusted to have the same MTU.
> The MTU that we attempt to configure the ports to is the most recently
> modified MTU. The attempt is to follow user intention as closely as
> possible and not be annoying at that.
> 
> So there are 2 cases really:
> 
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 mtu 1400
> ip link set dev sw0p1 master br0
> 
> The above sequence will make sw0p0 inherit MTU 1400 as well.
> 
> The second case:
> 
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 master br0
> ip link set dev sw0p0 mtu 1400
> 
> This sequence will make sw0p1 inherit MTU 1400 from sw0p0.
> 
> Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/bridge/br.c         |  1 +
>  net/bridge/br_if.c      | 93 +++++++++++++++++++++++++++++++++++++++++
>  net/bridge/br_private.h |  1 +
>  3 files changed, 95 insertions(+)
> 
> diff --git a/net/bridge/br.c b/net/bridge/br.c
> index b6fe30e3768f..5f05380df1ee 100644
> --- a/net/bridge/br.c
> +++ b/net/bridge/br.c
> @@ -57,6 +57,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
>  
>  	switch (event) {
>  	case NETDEV_CHANGEMTU:
> +		br_mtu_normalization(br, dev);
>  		br_mtu_auto_adjust(br);
>  		break;
>  
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 4fe30b182ee7..a228668920a6 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -514,6 +514,98 @@ void br_mtu_auto_adjust(struct net_bridge *br)
>  	br_opt_toggle(br, BROPT_MTU_SET_BY_USER, false);
>  }
>  
> +struct br_hw_port {
> +	struct list_head list;
> +	struct net_device *dev;
> +	int old_mtu;
> +};
> +
> +static int br_hw_port_list_set_mtu(struct list_head *hw_port_list, int mtu)
> +{
> +	const struct br_hw_port *p;
> +	int err;
> +
> +	list_for_each_entry(p, hw_port_list, list) {
> +		if (p->dev->mtu == mtu)
> +			continue;
> +
> +		err = dev_set_mtu(p->dev, mtu);
> +		if (err)
> +			goto rollback;
> +	}
> +
> +	return 0;
> +
> +rollback:
> +	list_for_each_entry_continue_reverse(p, hw_port_list, list) {
> +		if (p->dev->mtu == p->old_mtu)
> +			continue;
> +
> +		if (dev_set_mtu(p->dev, p->old_mtu))
> +			netdev_err(p->dev, "Failed to restore MTU\n");
> +	}
> +
> +	return err;
> +}
> +
> +static void br_hw_port_list_free(struct list_head *hw_port_list)
> +{
> +	struct br_hw_port *p, *n;
> +
> +	list_for_each_entry_safe(p, n, hw_port_list, list)
> +		kfree(p);
> +}
> +
> +/* Make the hardware datapath to/from @br_if limited to a common MTU */
> +void br_mtu_normalization(struct net_bridge *br, struct net_device *br_if)
> +{
> +	const struct net_bridge_port *p;
> +	struct list_head hw_port_list;
> +	int min_mtu = ETH_MAX_MTU;
> +	int err;
> +
> +	INIT_LIST_HEAD(&hw_port_list);
> +
> +	/* Populate the list of ports that are part of the same hardware bridge
> +	 * as the newly added port
> +	 */
> +	list_for_each_entry(p, &br->port_list, list) {
> +		struct br_hw_port *hw_port;
> +
> +		if (!netdev_port_same_parent_id(p->dev, br_if))
> +			continue;
> +
> +		if (min_mtu > p->dev->mtu)
> +			min_mtu = p->dev->mtu;
> +
> +		hw_port = kzalloc(sizeof(*hw_port), GFP_KERNEL);
> +		if (!hw_port)
> +			goto out;
> +
> +		hw_port->dev = p->dev;
> +		hw_port->old_mtu = p->dev->mtu;
> +
> +		list_add(&hw_port->list, &hw_port_list);
> +	}
> +
> +	/* Attempt to configure the entire hardware bridge to the newly added
> +	 * interface's MTU first, regardless of whether the intention of the
> +	 * user was to raise or lower it.
> +	 */
> +	err = br_hw_port_list_set_mtu(&hw_port_list, br_if->mtu);
> +	if (!err)
> +		goto out;
> +
> +	/* Clearly that didn't work out so well, so just set the minimum MTU on
> +	 * all hardware bridge ports now. If this fails too, then all ports will
> +	 * still have their old MTU rolled back anyway.
> +	 */
> +	br_hw_port_list_set_mtu(&hw_port_list, min_mtu);
> +
> +out:
> +	br_hw_port_list_free(&hw_port_list);
> +}
> +
>  static void br_set_gso_limits(struct net_bridge *br)
>  {
>  	unsigned int gso_max_size = GSO_MAX_SIZE;
> @@ -676,6 +768,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  	if (changed_addr)
>  		call_netdevice_notifiers(NETDEV_CHANGEADDR, br->dev);
>  
> +	br_mtu_normalization(br, dev);
>  	br_mtu_auto_adjust(br);
>  	br_set_gso_limits(br);
>  
> diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
> index 1f97703a52ff..df010e36228e 100644
> --- a/net/bridge/br_private.h
> +++ b/net/bridge/br_private.h
> @@ -693,6 +693,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev,
>  	      struct netlink_ext_ack *extack);
>  int br_del_if(struct net_bridge *br, struct net_device *dev);
>  void br_mtu_auto_adjust(struct net_bridge *br);
> +void br_mtu_normalization(struct net_bridge *br, struct net_device *br_if);
>  netdev_features_t br_features_recompute(struct net_bridge *br,
>  					netdev_features_t features);
>  void br_port_flags_change(struct net_bridge_port *port, unsigned long mask);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 10:17   ` Ido Schimmel
@ 2020-03-26 10:25     ` Vladimir Oltean
  2020-03-26 11:35       ` Ido Schimmel
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-26 10:25 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Jakub Kicinski, Nikolay Aleksandrov, netdev

Hi Ido,

On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
>
> Hi Vladimir,
>
> On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
> > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> >
> > In the initial attempt to add MTU configuration for DSA:
> >
> > https://patchwork.ozlabs.org/cover/1199868/
> >
> > Florian raised a concern about the bridge MTU normalization logic (when
> > you bridge an interface with MTU 9000 and one with MTU 1500). His
> > expectation was that the bridge would automatically change the MTU of
> > all its slave ports to the minimum MTU, if those slaves are part of the
> > same hardware bridge. However, it doesn't do that, and for good reason,
> > I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> > bridge net device itself, and not that of any slave port.  If it were to
> > modify the MTU of the slave ports, the effect would be that the user
> > wouldn't be able to increase the MTU of any bridge slave port as long as
> > it was part of the bridge, which would be a bit annoying to say the
> > least.
> >
> > The idea behind this behavior is that normal termination from Linux over
> > the L2 forwarding domain described by DSA should happen over the bridge
> > net device, which _is_ properly limited by the minimum MTU. And
> > termination over individual slave device is possible even if those are
> > bridged. But that is not "forwarding", so there's no reason to do
> > normalization there, since only a single interface sees that packet.
> >
> > The real problem is with the offloaded data path, where of course, the
> > bridge net device MTU is ignored. So a packet received on an interface
> > with MTU 9000 would still be forwarded to an interface with MTU 1500.
> > And that is exactly what this patch is trying to prevent from happening.
>
> How is that different from the software data path where the CPU needs to
> forward the packet between port A with MTU X and port B with MTU X/2 ?
>
> I don't really understand what problem you are trying to solve here. It
> seems like the user did some misconfiguration and now you're introducing
> a policy to mitigate it? If so, it should be something the user can
> disable. It also seems like something that can be easily handled by a
> user space application. You get netlink notifications for all these
> operations.
>

Actually I think the problem can be better understood if I explain
what the switches I'm dealing with look like.
None of them really has a 'MTU' register. They perform length-based
admission control on RX. At this moment in time I don't think anybody
wants to introduce an MRU knob in iproute2, so we're adjusting that
maximum ingress length through the MTU. But it becomes an inverted
problem, since the 'MTU' needs to be controlled for all possible
sources of traffic that are going to egress on this port, in order for
the real MTU on the port itself to be observed.

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 10:25     ` Vladimir Oltean
@ 2020-03-26 11:35       ` Ido Schimmel
  2020-03-26 11:44         ` Vladimir Oltean
  2020-03-26 12:06         ` Nikolay Aleksandrov
  0 siblings, 2 replies; 37+ messages in thread
From: Ido Schimmel @ 2020-03-26 11:35 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Jakub Kicinski, Nikolay Aleksandrov, netdev

On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
> Hi Ido,
> 
> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > Hi Vladimir,
> >
> > On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
> > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > >
> > > In the initial attempt to add MTU configuration for DSA:
> > >
> > > https://patchwork.ozlabs.org/cover/1199868/
> > >
> > > Florian raised a concern about the bridge MTU normalization logic (when
> > > you bridge an interface with MTU 9000 and one with MTU 1500). His
> > > expectation was that the bridge would automatically change the MTU of
> > > all its slave ports to the minimum MTU, if those slaves are part of the
> > > same hardware bridge. However, it doesn't do that, and for good reason,
> > > I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> > > bridge net device itself, and not that of any slave port.  If it were to
> > > modify the MTU of the slave ports, the effect would be that the user
> > > wouldn't be able to increase the MTU of any bridge slave port as long as
> > > it was part of the bridge, which would be a bit annoying to say the
> > > least.
> > >
> > > The idea behind this behavior is that normal termination from Linux over
> > > the L2 forwarding domain described by DSA should happen over the bridge
> > > net device, which _is_ properly limited by the minimum MTU. And
> > > termination over individual slave device is possible even if those are
> > > bridged. But that is not "forwarding", so there's no reason to do
> > > normalization there, since only a single interface sees that packet.
> > >
> > > The real problem is with the offloaded data path, where of course, the
> > > bridge net device MTU is ignored. So a packet received on an interface
> > > with MTU 9000 would still be forwarded to an interface with MTU 1500.
> > > And that is exactly what this patch is trying to prevent from happening.
> >
> > How is that different from the software data path where the CPU needs to
> > forward the packet between port A with MTU X and port B with MTU X/2 ?
> >
> > I don't really understand what problem you are trying to solve here. It
> > seems like the user did some misconfiguration and now you're introducing
> > a policy to mitigate it? If so, it should be something the user can
> > disable. It also seems like something that can be easily handled by a
> > user space application. You get netlink notifications for all these
> > operations.
> >
> 
> Actually I think the problem can be better understood if I explain
> what the switches I'm dealing with look like.
> None of them really has a 'MTU' register. They perform length-based
> admission control on RX.

IIUC, by that you mean that these switches only perform length-based
filtering on RX, but not on TX?

> At this moment in time I don't think anybody wants to introduce an MRU
> knob in iproute2, so we're adjusting that maximum ingress length
> through the MTU. But it becomes an inverted problem, since the 'MTU'
> needs to be controlled for all possible sources of traffic that are
> going to egress on this port, in order for the real MTU on the port
> itself to be observed.

Looking at your example from the changelog:

ip link set dev sw0p0 master br0
ip link set dev sw0p1 mtu 1400
ip link set dev sw0p1 master br0

Without your patch, after these commands sw0p0 has an MTU of 1500 and
sw0p1 has an MTU of 1400. Are you saying that a frame with a length of
1450 bytes received on sw0p0 will be able to egress sw0p1 (assuming it
should be forwarded there)?

If so, then I think I understand the problem. However, I don't think
such code belongs in the bridge driver as this restriction does not
apply to all switches. Also, I think that having the kernel change MTU
of port A following MTU change of port B is a bit surprising and not
intuitive.

I think you should be more explicit about it. Did you consider listening
to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
unsupported configurations with an appropriate extack message? If you
can't veto (in order not to break user space), you can still emit an
extack message.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 11:35       ` Ido Schimmel
@ 2020-03-26 11:44         ` Vladimir Oltean
  2020-03-26 11:54           ` Ido Schimmel
  2020-03-26 12:06         ` Nikolay Aleksandrov
  1 sibling, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-26 11:44 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Jakub Kicinski, Nikolay Aleksandrov, netdev

On Thu, 26 Mar 2020 at 13:35, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
> > Hi Ido,
> >
> > On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
> > >
> > > Hi Vladimir,
> > >
> > > On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
> > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > >
> > > > In the initial attempt to add MTU configuration for DSA:
> > > >
> > > > https://patchwork.ozlabs.org/cover/1199868/
> > > >
> > > > Florian raised a concern about the bridge MTU normalization logic (when
> > > > you bridge an interface with MTU 9000 and one with MTU 1500). His
> > > > expectation was that the bridge would automatically change the MTU of
> > > > all its slave ports to the minimum MTU, if those slaves are part of the
> > > > same hardware bridge. However, it doesn't do that, and for good reason,
> > > > I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> > > > bridge net device itself, and not that of any slave port.  If it were to
> > > > modify the MTU of the slave ports, the effect would be that the user
> > > > wouldn't be able to increase the MTU of any bridge slave port as long as
> > > > it was part of the bridge, which would be a bit annoying to say the
> > > > least.
> > > >
> > > > The idea behind this behavior is that normal termination from Linux over
> > > > the L2 forwarding domain described by DSA should happen over the bridge
> > > > net device, which _is_ properly limited by the minimum MTU. And
> > > > termination over individual slave device is possible even if those are
> > > > bridged. But that is not "forwarding", so there's no reason to do
> > > > normalization there, since only a single interface sees that packet.
> > > >
> > > > The real problem is with the offloaded data path, where of course, the
> > > > bridge net device MTU is ignored. So a packet received on an interface
> > > > with MTU 9000 would still be forwarded to an interface with MTU 1500.
> > > > And that is exactly what this patch is trying to prevent from happening.
> > >
> > > How is that different from the software data path where the CPU needs to
> > > forward the packet between port A with MTU X and port B with MTU X/2 ?
> > >
> > > I don't really understand what problem you are trying to solve here. It
> > > seems like the user did some misconfiguration and now you're introducing
> > > a policy to mitigate it? If so, it should be something the user can
> > > disable. It also seems like something that can be easily handled by a
> > > user space application. You get netlink notifications for all these
> > > operations.
> > >
> >
> > Actually I think the problem can be better understood if I explain
> > what the switches I'm dealing with look like.
> > None of them really has a 'MTU' register. They perform length-based
> > admission control on RX.
>
> IIUC, by that you mean that these switches only perform length-based
> filtering on RX, but not on TX?
>

Yes.

> > At this moment in time I don't think anybody wants to introduce an MRU
> > knob in iproute2, so we're adjusting that maximum ingress length
> > through the MTU. But it becomes an inverted problem, since the 'MTU'
> > needs to be controlled for all possible sources of traffic that are
> > going to egress on this port, in order for the real MTU on the port
> > itself to be observed.
>
> Looking at your example from the changelog:
>
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 mtu 1400
> ip link set dev sw0p1 master br0
>
> Without your patch, after these commands sw0p0 has an MTU of 1500 and
> sw0p1 has an MTU of 1400. Are you saying that a frame with a length of
> 1450 bytes received on sw0p0 will be able to egress sw0p1 (assuming it
> should be forwarded there)?
>

Yes.

> If so, then I think I understand the problem. However, I don't think
> such code belongs in the bridge driver as this restriction does not
> apply to all switches.

How do Mellanox switches deal with this?

> Also, I think that having the kernel change MTU
> of port A following MTU change of port B is a bit surprising and not
> intuitive.
>

It already changes the MTU of br0, this just goes along the same path.

> I think you should be more explicit about it. Did you consider listening
> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> unsupported configurations with an appropriate extack message? If you
> can't veto (in order not to break user space), you can still emit an
> extack message.

I suppose that is an alternative approach. This would be done from the
DSA core then? But instead of veto, just do the normalization thing.

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 11:44         ` Vladimir Oltean
@ 2020-03-26 11:54           ` Ido Schimmel
  2020-03-26 12:34             ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Ido Schimmel @ 2020-03-26 11:54 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Jakub Kicinski, Nikolay Aleksandrov, netdev

On Thu, Mar 26, 2020 at 01:44:51PM +0200, Vladimir Oltean wrote:
> On Thu, 26 Mar 2020 at 13:35, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
> > > Hi Ido,
> > >
> > > On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
> > > >
> > > > Hi Vladimir,
> > > >
> > > > On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
> > > > > From: Vladimir Oltean <vladimir.oltean@nxp.com>
> > > > >
> > > > > In the initial attempt to add MTU configuration for DSA:
> > > > >
> > > > > https://patchwork.ozlabs.org/cover/1199868/
> > > > >
> > > > > Florian raised a concern about the bridge MTU normalization logic (when
> > > > > you bridge an interface with MTU 9000 and one with MTU 1500). His
> > > > > expectation was that the bridge would automatically change the MTU of
> > > > > all its slave ports to the minimum MTU, if those slaves are part of the
> > > > > same hardware bridge. However, it doesn't do that, and for good reason,
> > > > > I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
> > > > > bridge net device itself, and not that of any slave port.  If it were to
> > > > > modify the MTU of the slave ports, the effect would be that the user
> > > > > wouldn't be able to increase the MTU of any bridge slave port as long as
> > > > > it was part of the bridge, which would be a bit annoying to say the
> > > > > least.
> > > > >
> > > > > The idea behind this behavior is that normal termination from Linux over
> > > > > the L2 forwarding domain described by DSA should happen over the bridge
> > > > > net device, which _is_ properly limited by the minimum MTU. And
> > > > > termination over individual slave device is possible even if those are
> > > > > bridged. But that is not "forwarding", so there's no reason to do
> > > > > normalization there, since only a single interface sees that packet.
> > > > >
> > > > > The real problem is with the offloaded data path, where of course, the
> > > > > bridge net device MTU is ignored. So a packet received on an interface
> > > > > with MTU 9000 would still be forwarded to an interface with MTU 1500.
> > > > > And that is exactly what this patch is trying to prevent from happening.
> > > >
> > > > How is that different from the software data path where the CPU needs to
> > > > forward the packet between port A with MTU X and port B with MTU X/2 ?
> > > >
> > > > I don't really understand what problem you are trying to solve here. It
> > > > seems like the user did some misconfiguration and now you're introducing
> > > > a policy to mitigate it? If so, it should be something the user can
> > > > disable. It also seems like something that can be easily handled by a
> > > > user space application. You get netlink notifications for all these
> > > > operations.
> > > >
> > >
> > > Actually I think the problem can be better understood if I explain
> > > what the switches I'm dealing with look like.
> > > None of them really has a 'MTU' register. They perform length-based
> > > admission control on RX.
> >
> > IIUC, by that you mean that these switches only perform length-based
> > filtering on RX, but not on TX?
> >
> 
> Yes.
> 
> > > At this moment in time I don't think anybody wants to introduce an MRU
> > > knob in iproute2, so we're adjusting that maximum ingress length
> > > through the MTU. But it becomes an inverted problem, since the 'MTU'
> > > needs to be controlled for all possible sources of traffic that are
> > > going to egress on this port, in order for the real MTU on the port
> > > itself to be observed.
> >
> > Looking at your example from the changelog:
> >
> > ip link set dev sw0p0 master br0
> > ip link set dev sw0p1 mtu 1400
> > ip link set dev sw0p1 master br0
> >
> > Without your patch, after these commands sw0p0 has an MTU of 1500 and
> > sw0p1 has an MTU of 1400. Are you saying that a frame with a length of
> > 1450 bytes received on sw0p0 will be able to egress sw0p1 (assuming it
> > should be forwarded there)?
> >
> 
> Yes.
> 
> > If so, then I think I understand the problem. However, I don't think
> > such code belongs in the bridge driver as this restriction does not
> > apply to all switches.
> 
> How do Mellanox switches deal with this?

Frames will be discarded on the egress of sw0p1.

> 
> > Also, I think that having the kernel change MTU
> > of port A following MTU change of port B is a bit surprising and not
> > intuitive.
> >
> 
> It already changes the MTU of br0, this just goes along the same path.

Yea, but this is an established behavior already. And it applies
regardless if the data path is offloaded or not, unlike this change.

> > I think you should be more explicit about it. Did you consider listening
> > to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> > unsupported configurations with an appropriate extack message? If you
> > can't veto (in order not to break user space), you can still emit an
> > extack message.
> 
> I suppose that is an alternative approach. This would be done from the
> DSA core then? But instead of veto, just do the normalization thing.

Not really my call, but I think the veto is better because you are being
explicit about it and informing the user with an appropriate message.

> 
> Thanks,
> -Vladimir

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 11:35       ` Ido Schimmel
  2020-03-26 11:44         ` Vladimir Oltean
@ 2020-03-26 12:06         ` Nikolay Aleksandrov
  2020-03-26 12:18           ` Vladimir Oltean
  1 sibling, 1 reply; 37+ messages in thread
From: Nikolay Aleksandrov @ 2020-03-26 12:06 UTC (permalink / raw)
  To: Ido Schimmel, Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Jakub Kicinski, netdev

On 26/03/2020 13:35, Ido Schimmel wrote:
> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
>> Hi Ido,
>>
>> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
>>>
>>> Hi Vladimir,
>>>
>>> On Wed, Mar 25, 2020 at 05:22:09PM +0200, Vladimir Oltean wrote:
>>>> From: Vladimir Oltean <vladimir.oltean@nxp.com>
>>>>
>>>> In the initial attempt to add MTU configuration for DSA:
>>>>
>>>> https://patchwork.ozlabs.org/cover/1199868/
>>>>
>>>> Florian raised a concern about the bridge MTU normalization logic (when
>>>> you bridge an interface with MTU 9000 and one with MTU 1500). His
>>>> expectation was that the bridge would automatically change the MTU of
>>>> all its slave ports to the minimum MTU, if those slaves are part of the
>>>> same hardware bridge. However, it doesn't do that, and for good reason,
>>>> I think. What br_mtu_auto_adjust() does is it adjusts the MTU of the
>>>> bridge net device itself, and not that of any slave port.  If it were to
>>>> modify the MTU of the slave ports, the effect would be that the user
>>>> wouldn't be able to increase the MTU of any bridge slave port as long as
>>>> it was part of the bridge, which would be a bit annoying to say the
>>>> least.
>>>>
>>>> The idea behind this behavior is that normal termination from Linux over
>>>> the L2 forwarding domain described by DSA should happen over the bridge
>>>> net device, which _is_ properly limited by the minimum MTU. And
>>>> termination over individual slave device is possible even if those are
>>>> bridged. But that is not "forwarding", so there's no reason to do
>>>> normalization there, since only a single interface sees that packet.
>>>>
>>>> The real problem is with the offloaded data path, where of course, the
>>>> bridge net device MTU is ignored. So a packet received on an interface
>>>> with MTU 9000 would still be forwarded to an interface with MTU 1500.
>>>> And that is exactly what this patch is trying to prevent from happening.
>>>
>>> How is that different from the software data path where the CPU needs to
>>> forward the packet between port A with MTU X and port B with MTU X/2 ?
>>>
>>> I don't really understand what problem you are trying to solve here. It
>>> seems like the user did some misconfiguration and now you're introducing
>>> a policy to mitigate it? If so, it should be something the user can
>>> disable. It also seems like something that can be easily handled by a
>>> user space application. You get netlink notifications for all these
>>> operations.
>>>
>>
>> Actually I think the problem can be better understood if I explain
>> what the switches I'm dealing with look like.
>> None of them really has a 'MTU' register. They perform length-based
>> admission control on RX.
> 
> IIUC, by that you mean that these switches only perform length-based
> filtering on RX, but not on TX?
> 
>> At this moment in time I don't think anybody wants to introduce an MRU
>> knob in iproute2, so we're adjusting that maximum ingress length
>> through the MTU. But it becomes an inverted problem, since the 'MTU'
>> needs to be controlled for all possible sources of traffic that are
>> going to egress on this port, in order for the real MTU on the port
>> itself to be observed.
> 
> Looking at your example from the changelog:
> 
> ip link set dev sw0p0 master br0
> ip link set dev sw0p1 mtu 1400
> ip link set dev sw0p1 master br0
> 
> Without your patch, after these commands sw0p0 has an MTU of 1500 and
> sw0p1 has an MTU of 1400. Are you saying that a frame with a length of
> 1450 bytes received on sw0p0 will be able to egress sw0p1 (assuming it
> should be forwarded there)?
> 
> If so, then I think I understand the problem. However, I don't think
> such code belongs in the bridge driver as this restriction does not
> apply to all switches. Also, I think that having the kernel change MTU
> of port A following MTU change of port B is a bit surprising and not
> intuitive.
> 
> I think you should be more explicit about it. Did you consider listening
> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> unsupported configurations with an appropriate extack message? If you
> can't veto (in order not to break user space), you can still emit an
> extack message.
> 

+1, this sounds more appropriate IMO


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 12:06         ` Nikolay Aleksandrov
@ 2020-03-26 12:18           ` Vladimir Oltean
  2020-03-26 12:19             ` Nikolay Aleksandrov
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-26 12:18 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Ido Schimmel, Andrew Lunn, Florian Fainelli, Vivien Didelot,
	David S. Miller, Jakub Kicinski, murali.policharla,
	Stephen Hemminger, Jiri Pirko, Jakub Kicinski, netdev

On Thu, 26 Mar 2020 at 14:06, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
>
> On 26/03/2020 13:35, Ido Schimmel wrote:
> > On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
> >> Hi Ido,
> >>
> >> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
> >>>
[snip]
> >
> > I think you should be more explicit about it. Did you consider listening
> > to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> > unsupported configurations with an appropriate extack message? If you
> > can't veto (in order not to break user space), you can still emit an
> > extack message.
> >
>
> +1, this sounds more appropriate IMO
>

And what does vetoing gain me exactly? The practical inability to
change the MTU of any interface that is already bridged and applies
length check on RX?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 12:18           ` Vladimir Oltean
@ 2020-03-26 12:19             ` Nikolay Aleksandrov
  2020-03-26 12:25               ` Vladimir Oltean
  0 siblings, 1 reply; 37+ messages in thread
From: Nikolay Aleksandrov @ 2020-03-26 12:19 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Ido Schimmel, Andrew Lunn, Florian Fainelli, Vivien Didelot,
	David S. Miller, Jakub Kicinski, murali.policharla,
	Stephen Hemminger, Jiri Pirko, Jakub Kicinski, netdev

On 26/03/2020 14:18, Vladimir Oltean wrote:
> On Thu, 26 Mar 2020 at 14:06, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>>
>> On 26/03/2020 13:35, Ido Schimmel wrote:
>>> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
>>>> Hi Ido,
>>>>
>>>> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
>>>>>
> [snip]
>>>
>>> I think you should be more explicit about it. Did you consider listening
>>> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
>>> unsupported configurations with an appropriate extack message? If you
>>> can't veto (in order not to break user space), you can still emit an
>>> extack message.
>>>
>>
>> +1, this sounds more appropriate IMO
>>
> 
> And what does vetoing gain me exactly? The practical inability to
> change the MTU of any interface that is already bridged and applies
> length check on RX?
> 

I was referring to moving the logic to NETDEV_PRECHANGEMTU, the rest is up to you.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 12:19             ` Nikolay Aleksandrov
@ 2020-03-26 12:25               ` Vladimir Oltean
  2020-03-26 12:38                 ` Nikolay Aleksandrov
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-26 12:25 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Ido Schimmel, Andrew Lunn, Florian Fainelli, Vivien Didelot,
	David S. Miller, Jakub Kicinski, murali.policharla,
	Stephen Hemminger, Jiri Pirko, Jakub Kicinski, netdev

On Thu, 26 Mar 2020 at 14:19, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
>
> On 26/03/2020 14:18, Vladimir Oltean wrote:
> > On Thu, 26 Mar 2020 at 14:06, Nikolay Aleksandrov
> > <nikolay@cumulusnetworks.com> wrote:
> >>
> >> On 26/03/2020 13:35, Ido Schimmel wrote:
> >>> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
> >>>> Hi Ido,
> >>>>
> >>>> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
> >>>>>
> > [snip]
> >>>
> >>> I think you should be more explicit about it. Did you consider listening
> >>> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> >>> unsupported configurations with an appropriate extack message? If you
> >>> can't veto (in order not to break user space), you can still emit an
> >>> extack message.
> >>>
> >>
> >> +1, this sounds more appropriate IMO
> >>
> >
> > And what does vetoing gain me exactly? The practical inability to
> > change the MTU of any interface that is already bridged and applies
> > length check on RX?
> >
>
> I was referring to moving the logic to NETDEV_PRECHANGEMTU, the rest is up to you.
>

If I'm not going to veto, then I don't see a lot of sense in listening
on this particular notifier either. I can do the normalization just
fine on NETDEV_CHANGEMTU.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 11:54           ` Ido Schimmel
@ 2020-03-26 12:34             ` Vladimir Oltean
  2020-03-26 12:59               ` Ido Schimmel
  0 siblings, 1 reply; 37+ messages in thread
From: Vladimir Oltean @ 2020-03-26 12:34 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Jakub Kicinski, Nikolay Aleksandrov, netdev

On Thu, 26 Mar 2020 at 13:54, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Thu, Mar 26, 2020 at 01:44:51PM +0200, Vladimir Oltean wrote:
> > On Thu, 26 Mar 2020 at 13:35, Ido Schimmel <idosch@idosch.org> wrote:
> > >

> > > Also, I think that having the kernel change MTU
> > > of port A following MTU change of port B is a bit surprising and not
> > > intuitive.
> > >
> >
> > It already changes the MTU of br0, this just goes along the same path.
>
> Yea, but this is an established behavior already. And it applies
> regardless if the data path is offloaded or not, unlike this change.
>

I don't understand the 'established behavior' argument.
And I need to make a correction regarding the fact that it applies
regardless of whether the data path is offloaded or not.
If the data path is offloaded, it applies, sure, but it has no effect.
That is my issue with it.

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 12:25               ` Vladimir Oltean
@ 2020-03-26 12:38                 ` Nikolay Aleksandrov
  2020-03-26 18:49                   ` Jakub Kicinski
  0 siblings, 1 reply; 37+ messages in thread
From: Nikolay Aleksandrov @ 2020-03-26 12:38 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Ido Schimmel, Andrew Lunn, Florian Fainelli, Vivien Didelot,
	David S. Miller, Jakub Kicinski, murali.policharla,
	Stephen Hemminger, Jiri Pirko, Jakub Kicinski, netdev

On 26/03/2020 14:25, Vladimir Oltean wrote:
> On Thu, 26 Mar 2020 at 14:19, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>>
>> On 26/03/2020 14:18, Vladimir Oltean wrote:
>>> On Thu, 26 Mar 2020 at 14:06, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:
>>>>
>>>> On 26/03/2020 13:35, Ido Schimmel wrote:
>>>>> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:
>>>>>> Hi Ido,
>>>>>>
>>>>>> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:
>>>>>>>
>>> [snip]
>>>>>
>>>>> I think you should be more explicit about it. Did you consider listening
>>>>> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
>>>>> unsupported configurations with an appropriate extack message? If you
>>>>> can't veto (in order not to break user space), you can still emit an
>>>>> extack message.
>>>>>
>>>>
>>>> +1, this sounds more appropriate IMO
>>>>
>>>
>>> And what does vetoing gain me exactly? The practical inability to
>>> change the MTU of any interface that is already bridged and applies
>>> length check on RX?
>>>
>>
>> I was referring to moving the logic to NETDEV_PRECHANGEMTU, the rest is up to you.
>>
> 
> If I'm not going to veto, then I don't see a lot of sense in listening
> on this particular notifier either. I can do the normalization just
> fine on NETDEV_CHANGEMTU.
> 

I should've been more explicit - I meant I agree that this change doesn't belong in
the bridge, and handling it in a notifier in the driver seems like a better place.
Yes - if it's not going to be a vetto, then CHANGEMTU is fine.



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 12:34             ` Vladimir Oltean
@ 2020-03-26 12:59               ` Ido Schimmel
  0 siblings, 0 replies; 37+ messages in thread
From: Ido Schimmel @ 2020-03-26 12:59 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vivien Didelot, David S. Miller,
	Jakub Kicinski, murali.policharla, Stephen Hemminger, Jiri Pirko,
	Jakub Kicinski, Nikolay Aleksandrov, netdev

On Thu, Mar 26, 2020 at 02:34:13PM +0200, Vladimir Oltean wrote:
> On Thu, 26 Mar 2020 at 13:54, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Thu, Mar 26, 2020 at 01:44:51PM +0200, Vladimir Oltean wrote:
> > > On Thu, 26 Mar 2020 at 13:35, Ido Schimmel <idosch@idosch.org> wrote:
> > > >
> 
> > > > Also, I think that having the kernel change MTU
> > > > of port A following MTU change of port B is a bit surprising and not
> > > > intuitive.
> > > >
> > >
> > > It already changes the MTU of br0, this just goes along the same path.
> >
> > Yea, but this is an established behavior already. And it applies
> > regardless if the data path is offloaded or not, unlike this change.
> >
> 
> I don't understand the 'established behavior' argument.

I meant that this behavior is always the same. There is no difference
between pre-5.7 kernels and later ones.

> And I need to make a correction regarding the fact that it applies
> regardless of whether the data path is offloaded or not.
> If the data path is offloaded, it applies, sure, but it has no effect.
> That is my issue with it.

But it does have effect. At least on switches that support L3 offload.
If the MTU of the bridge is changed, then we also change the MTU of the
corresponding router interface (RIF).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 12:38                 ` Nikolay Aleksandrov
@ 2020-03-26 18:49                   ` Jakub Kicinski
  2020-03-26 19:41                     ` Nikolay Aleksandrov
  0 siblings, 1 reply; 37+ messages in thread
From: Jakub Kicinski @ 2020-03-26 18:49 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Vladimir Oltean, Ido Schimmel, Andrew Lunn, Florian Fainelli,
	Vivien Didelot, David S. Miller, Jakub Kicinski,
	murali.policharla, Stephen Hemminger, Jiri Pirko, netdev

On Thu, 26 Mar 2020 14:38:57 +0200 Nikolay Aleksandrov wrote:
> On 26/03/2020 14:25, Vladimir Oltean wrote:
> > On Thu, 26 Mar 2020 at 14:19, Nikolay Aleksandrov
> > <nikolay@cumulusnetworks.com> wrote:  
> >>
> >> On 26/03/2020 14:18, Vladimir Oltean wrote:  
> >>> On Thu, 26 Mar 2020 at 14:06, Nikolay Aleksandrov
> >>> <nikolay@cumulusnetworks.com> wrote:  
> >>>>
> >>>> On 26/03/2020 13:35, Ido Schimmel wrote:  
> >>>>> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:  
> >>>>>> Hi Ido,
> >>>>>>
> >>>>>> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:  
> >>>>>>>  
> >>> [snip]  
> >>>>>
> >>>>> I think you should be more explicit about it. Did you consider listening
> >>>>> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
> >>>>> unsupported configurations with an appropriate extack message? If you
> >>>>> can't veto (in order not to break user space), you can still emit an
> >>>>> extack message.
> >>>>>  
> >>>>
> >>>> +1, this sounds more appropriate IMO
> >>>>  
> >>>
> >>> And what does vetoing gain me exactly? The practical inability to
> >>> change the MTU of any interface that is already bridged and applies
> >>> length check on RX?
> >>>  
> >>
> >> I was referring to moving the logic to NETDEV_PRECHANGEMTU, the rest is up to you.
> >>  
> > 
> > If I'm not going to veto, then I don't see a lot of sense in listening
> > on this particular notifier either. I can do the normalization just
> > fine on NETDEV_CHANGEMTU.
> 
> I should've been more explicit - I meant I agree that this change doesn't belong in
> the bridge, and handling it in a notifier in the driver seems like a better place.
> Yes - if it's not going to be a vetto, then CHANGEMTU is fine.

I'm not sure pushing behavior decisions like that out to the drivers 
is ever a good idea. Linux should abstract HW differences after all,
we don't want different drivers to perform different magic behind
user's back. 

I'd think if HW is unable to apply given configuration vetoing is both
correct and expected..

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath
  2020-03-26 18:49                   ` Jakub Kicinski
@ 2020-03-26 19:41                     ` Nikolay Aleksandrov
  0 siblings, 0 replies; 37+ messages in thread
From: Nikolay Aleksandrov @ 2020-03-26 19:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vladimir Oltean, Ido Schimmel, Andrew Lunn, Florian Fainelli,
	Vivien Didelot, David S. Miller, Jakub Kicinski,
	murali.policharla, Stephen Hemminger, Jiri Pirko, netdev

On 26/03/2020 20:49, Jakub Kicinski wrote:
> On Thu, 26 Mar 2020 14:38:57 +0200 Nikolay Aleksandrov wrote:
>> On 26/03/2020 14:25, Vladimir Oltean wrote:
>>> On Thu, 26 Mar 2020 at 14:19, Nikolay Aleksandrov
>>> <nikolay@cumulusnetworks.com> wrote:  
>>>>
>>>> On 26/03/2020 14:18, Vladimir Oltean wrote:  
>>>>> On Thu, 26 Mar 2020 at 14:06, Nikolay Aleksandrov
>>>>> <nikolay@cumulusnetworks.com> wrote:  
>>>>>>
>>>>>> On 26/03/2020 13:35, Ido Schimmel wrote:  
>>>>>>> On Thu, Mar 26, 2020 at 12:25:20PM +0200, Vladimir Oltean wrote:  
>>>>>>>> Hi Ido,
>>>>>>>>
>>>>>>>> On Thu, 26 Mar 2020 at 12:17, Ido Schimmel <idosch@idosch.org> wrote:  
>>>>>>>>>  
>>>>> [snip]  
>>>>>>>
>>>>>>> I think you should be more explicit about it. Did you consider listening
>>>>>>> to 'NETDEV_PRECHANGEMTU' notifications in relevant drivers and vetoing
>>>>>>> unsupported configurations with an appropriate extack message? If you
>>>>>>> can't veto (in order not to break user space), you can still emit an
>>>>>>> extack message.
>>>>>>>  
>>>>>>
>>>>>> +1, this sounds more appropriate IMO
>>>>>>  
>>>>>
>>>>> And what does vetoing gain me exactly? The practical inability to
>>>>> change the MTU of any interface that is already bridged and applies
>>>>> length check on RX?
>>>>>  
>>>>
>>>> I was referring to moving the logic to NETDEV_PRECHANGEMTU, the rest is up to you.
>>>>  
>>>
>>> If I'm not going to veto, then I don't see a lot of sense in listening
>>> on this particular notifier either. I can do the normalization just
>>> fine on NETDEV_CHANGEMTU.
>>
>> I should've been more explicit - I meant I agree that this change doesn't belong in
>> the bridge, and handling it in a notifier in the driver seems like a better place.
>> Yes - if it's not going to be a vetto, then CHANGEMTU is fine.
> 
> I'm not sure pushing behavior decisions like that out to the drivers 
> is ever a good idea. Linux should abstract HW differences after all,
> we don't want different drivers to perform different magic behind
> user's back. 
> 
> I'd think if HW is unable to apply given configuration vetoing is both
> correct and expected..
> 

This change implements a policy and makes it default for all HW-offloaded devices, but
not all of them have these restrictions or need it. Moreover MTU handling has always been
a vendor/driver-specific problem.
I do agree about the veto part, in my experience we've had countless problem reports
due to the bridge auto-MTU adjusting, but it seems to me it's the driver authors' right
to implement any policy they want as long as it doesn't affect everyone else.





^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, back to index

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-25 15:21 [PATCH v2 net-next 00/10] Configure the MTU on DSA switches Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 01/10] net: dsa: configure the MTU for switch ports Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 02/10] net: phy: bcm7xx: Add jumbo frame configuration to PHY Vladimir Oltean
2020-03-25 15:44   ` Heiner Kallweit
2020-03-25 22:45     ` Vladimir Oltean
2020-03-25 23:02       ` Florian Fainelli
2020-03-25 23:21       ` Heiner Kallweit
2020-03-25 15:22 ` [PATCH v2 net-next 03/10] bgmac: Add support for Jumbo frames Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 04/10] bgmac: Add MTU configuration support to the driver Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 05/10] bgmac: Add DMA support to handle frames beyond 8192 bytes Vladimir Oltean
2020-03-25 23:07   ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 06/10] net: dsa: b53: Add MTU configuration support Vladimir Oltean
2020-03-25 23:21   ` Florian Fainelli
2020-03-26  0:48     ` Vladimir Oltean
2020-03-25 15:22 ` [PATCH v2 net-next 07/10] net: dsa: sja1105: Implement the port MTU callbacks Vladimir Oltean
2020-03-25 23:08   ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 08/10] net: dsa: vsc73xx: Make the MTU configurable Vladimir Oltean
2020-03-25 23:09   ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 09/10] net: dsa: felix: support changing the MTU Vladimir Oltean
2020-03-25 23:10   ` Florian Fainelli
2020-03-25 15:22 ` [PATCH v2 net-next 10/10] net: bridge: implement auto-normalization of MTU for hardware datapath Vladimir Oltean
2020-03-25 23:17   ` Florian Fainelli
2020-03-26  0:30     ` Vladimir Oltean
2020-03-26 10:17   ` Ido Schimmel
2020-03-26 10:25     ` Vladimir Oltean
2020-03-26 11:35       ` Ido Schimmel
2020-03-26 11:44         ` Vladimir Oltean
2020-03-26 11:54           ` Ido Schimmel
2020-03-26 12:34             ` Vladimir Oltean
2020-03-26 12:59               ` Ido Schimmel
2020-03-26 12:06         ` Nikolay Aleksandrov
2020-03-26 12:18           ` Vladimir Oltean
2020-03-26 12:19             ` Nikolay Aleksandrov
2020-03-26 12:25               ` Vladimir Oltean
2020-03-26 12:38                 ` Nikolay Aleksandrov
2020-03-26 18:49                   ` Jakub Kicinski
2020-03-26 19:41                     ` Nikolay Aleksandrov

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git