netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another
@ 2020-05-27 21:25 Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 01/11] net: geneve: enable vlan offloads Edwin Peer
                   ` (11 more replies)
  0 siblings, 12 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

This series began life as a modest attempt to fix two issues pertaining
to VLANs nested inside Geneve tunnels and snowballed from there. The
first issue, addressed by a simple one-liner, is that GSO is not enabled
for upper VLAN devices on top of Geneve. The second issue, addressed by
the balance of the series, deals largely with MTU handling. VLAN devices
above L2 in L3 tunnels inherit the MTU of the underlying device. This
causes IP fragmentation because the inner L2 cannot be expanded within
the same maximum L3 size to accommodate the additional VLAN tag.

As a first attempt, a new flag was introduced to generalize what was
already being done for MACsec devices. This flag was unconditionally
set for all devices that have a size constrained L2, such as is the
case for Geneve and VXLAN tunnel devices. This doesn't quite do the
right thing, however, if the underlying device MTU happens to be
configured to a lower MTU than is supported. Thus, the approach was
further refined to set IFF_NO_VLAN_ROOM when changing MTU, based on
whether the underlying device L2 still has room for VLAN tags, but
stopping short of registering device notifiers to update upper device
MTU whenever a lower device changes. VLAN devices will thus do the
sensible thing if they are applied to an already configured device,
but will not dynamically update whenever the underlying device's MTU
is subsequently changed (this seemed a bridge too far).

Aggregate devices presented the next challenge. Transitively propagating
IFF_NO_VLAN_ROOM via bonds, teams and the like seemed similar in
principle to the handling of IFF_XMIT_DST_RELEASE (only the opposite),
but IFF_XMIT_DST_RELEASE_PERM evaded understanding. Ultimately this flag
failed to justify its existence, allowing the new flag to take its place
and avoid taking up the last bit in the enum.

Finally, an audit of the other net devices in the tree was conducted to
discover where else this new behavior may be appropriate. At this point
it was also discovered that GRE devices would happily allow VLANs to be
added even when L3 is being tunneled in L3, hence restricting VLANs to
ARPHRD_ETHER devices. Between ARPHRD_ETHER and IFF_NO_VLAN_ROOM, it now
seemed only a hop and a skip to eliminate NET_F_VLAN_CHALLENGED too, but
alas there are still a few holdouts that would appear to require more of
a moonshot to address.

Edwin Peer (11):
  net: geneve: enable vlan offloads
  net: do away with the IFF_XMIT_DST_RELEASE_PERM flag
  net: vlan: add IFF_NO_VLAN_ROOM to constrain MTU
  net: geneve: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM
  net: vxlan: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM
  net: l2tp_eth: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM
  net: gre: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM
  net: distribute IFF_NO_VLAN_ROOM into aggregate devs
  net: ntb_netdev: support VLAN using IFF_NO_VLAN_ROOM
  net: vlan: disallow non-Ethernet devices
  net: leverage IFF_NO_VLAN_ROOM to limit NETIF_F_VLAN_CHALLENGED

 Documentation/networking/netdev-features.rst  |   4 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c     |   3 +-
 drivers/net/bonding/bond_main.c               |  15 ++-
 drivers/net/ethernet/intel/e100.c             |  15 ++-
 .../net/ethernet/mellanox/mlxsw/switchx2.c    |  52 ++++++--
 drivers/net/ethernet/toshiba/ps3_gelic_net.c  |  12 +-
 drivers/net/ethernet/wiznet/w5100.c           |   6 +-
 drivers/net/ethernet/wiznet/w5300.c           |   6 +-
 drivers/net/ethernet/xilinx/ll_temac_main.c   |   1 -
 drivers/net/geneve.c                          |  17 ++-
 drivers/net/ifb.c                             |   4 +
 drivers/net/loopback.c                        |   1 -
 drivers/net/macsec.c                          |   6 +-
 drivers/net/net_failover.c                    |  31 +++--
 drivers/net/ntb_netdev.c                      |   8 +-
 drivers/net/rionet.c                          |   3 +
 drivers/net/sb1000.c                          |   1 +
 drivers/net/team/team.c                       |  16 +--
 drivers/net/vrf.c                             |   4 +-
 drivers/net/vxlan.c                           |  10 +-
 drivers/net/wimax/i2400m/netdev.c             |   5 +-
 drivers/s390/net/qeth_l2_main.c               |  12 +-
 include/linux/if_vlan.h                       |  48 ++++++++
 include/linux/netdevice.h                     |  12 +-
 net/8021q/vlan.c                              |   2 +-
 net/8021q/vlan_dev.c                          |   9 ++
 net/8021q/vlan_netlink.c                      |   2 +
 net/core/dev.c                                |   2 +-
 net/ipv4/ip_tunnel.c                          |   2 +
 net/ipv6/ip6_gre.c                            |   4 +-
 net/l2tp/l2tp_eth.c                           | 114 ++++++++++--------
 net/sched/sch_teql.c                          |   3 +
 32 files changed, 290 insertions(+), 140 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 01/11] net: geneve: enable vlan offloads
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 02/11] net: do away with the IFF_XMIT_DST_RELEASE_PERM flag Edwin Peer
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

Support offloads for nested VLANs in the same fashion as is done for
VXLAN interfaces. In particular, this allows software GSO and checksum
offload to function when VLANs are encapsulated inside Geneve tunnels.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 drivers/net/geneve.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 6b461be1820b..8d790cf85b21 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1137,6 +1137,8 @@ static void geneve_setup(struct net_device *dev)
 	dev->features    |= NETIF_F_RXCSUM;
 	dev->features    |= NETIF_F_GSO_SOFTWARE;
 
+	dev->vlan_features = dev->features;
+
 	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
 	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 02/11] net: do away with the IFF_XMIT_DST_RELEASE_PERM flag
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 01/11] net: geneve: enable vlan offloads Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 03/11] net: vlan: add IFF_NO_VLAN_ROOM to constrain MTU Edwin Peer
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

IFF_XMIT_DST_RELEASE_PERM is redundant. It is only ever set by
alloc_netdev_mqs() which also sets IFF_XMIT_DST_RELEASE. And, whenever
IFF_XMIT_DST_RELEASE_PERM is cleared, this only in netif_keep_dst(),
so too IFF_XMIT_DST_RELEASE is cleared. That is, any change to the
IFF_XMIT_DST_RELEASE_PERM flag implies a corresponding update to
IFF_XMIT_DST_RELEASE.

The converse is not universally true, but where this is not the
case, it doesn't matter. While IFF_XMIT_DST_RELEASE can become
cleared independently of IFF_XMIT_DST_RELEASE_PERM, the former is
generally never set anywhere the latter is not. The only exception
to this is the VLAN driver, more on this later.

The IFF_XMIT_DST_RELEASE_PERM flag is used when computing features of
team style aggregate devices. Here the general pattern is:

	flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;

	for each slave_dev {
		flags &= slave_dev->priv_flags;
	}

	master_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
	if (flags == (IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM))
		master_dev->priv_flags |= IFF_XMIT_DST_RELEASE;

The bonding driver differs slightly from this pattern, having an
additional conjunction testing IFF_XMIT_DST_RELEASE_PERM of the master
device. This term can be ignored, however, since it is always true, the
bonding driver never changes the default value of this flag.

Given slave flags, the truth table for IFF_XMIT_DST_RELEASE of the master
device is given by:

CASE | IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM | RESULT
-----+----------------------+---------------------------+-------
1.   |          0           |             0             |   0
2.   |          0           |             1             |   0
3.   |          1           |             0             |   0
4.   |          1           |             1             |   1

With the exception of VLAN devices, case 3 is impossible. Most devices
fall into case 1 or 4, with a device starting out with both flags set
and making use of the netif_keep_dst() interface to clear both flags as
needed. Aggregate devices, if they are nested and have a slave that
keeps dst, are an example of case 2. This class of device begins with
IFF_XMIT_DST_RELEASE cleared and always leaves IFF_XMIT_DST_RELEASE_PERM
set, never touching it. Thus, barring VLAN devices, a slave device needs
IFF_XMIT_DST_RELEASE set, and only this set, in order for the resultant
flag to be set. The same logical argument generalizes to multiple slave
devices as well as arbitrarily nested devices, any single slave or
nested device that has IFF_XMIT_DST_RELEASE cleared will result in the
master device having the flag cleared, but IFF_XMIT_DST_RELEASE_PERM
will always remain set, ensuring case 3 is again impossible.

Before commit 0287587884b15 ("net: better IFF_XMIT_DST_RELEASE support"),
VLAN devices behaved the same way as the aggregate devices, each
starting with IFF_XMIT_RELEASE_DST cleared and later setting it based on
the requirements of the underlying real device. This change introduced
netif_keep_dst() and converted VLAN devices to use the new interface,
with the side effect that IFF_XMIT_RELEASE_DST_PERM is now always
cleared in vlan_setup() and can never become set. This change in
behavior introduces a subtle bug, since now any VLAN device enslaved
to an aggregate device will unconditionally hold on to dst even if the
underlying device no longer needs it. Thus, while VLAN devices are an
example of case 3, they should not be and so IFF_XMIT_DST_RELEASE_PERM
can be safely removed restoring the correct VLAN behavior.

Fixes: 0287587884b15 ("net: better IFF_XMIT_DST_RELEASE support")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 drivers/net/bonding/bond_main.c | 12 ++++--------
 drivers/net/net_failover.c      | 16 ++++++++--------
 drivers/net/team/team.c         | 11 ++++-------
 include/linux/netdevice.h       |  6 +-----
 net/core/dev.c                  |  2 +-
 5 files changed, 18 insertions(+), 29 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index a25c65d4af71..4c45f1692934 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1132,8 +1132,6 @@ static netdev_features_t bond_fix_features(struct net_device *dev,
 
 static void bond_compute_features(struct bonding *bond)
 {
-	unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |
-					IFF_XMIT_DST_RELEASE_PERM;
 	netdev_features_t vlan_features = BOND_VLAN_FEATURES;
 	netdev_features_t enc_features  = BOND_ENC_FEATURES;
 	netdev_features_t mpls_features  = BOND_MPLS_FEATURES;
@@ -1148,6 +1146,7 @@ static void bond_compute_features(struct bonding *bond)
 		goto done;
 	vlan_features &= NETIF_F_ALL_FOR_ALL;
 	mpls_features &= NETIF_F_ALL_FOR_ALL;
+	bond_dev->priv_flags |= IFF_XMIT_DST_RELEASE;
 
 	bond_for_each_slave(bond, slave, iter) {
 		vlan_features = netdev_increment_features(vlan_features,
@@ -1161,7 +1160,9 @@ static void bond_compute_features(struct bonding *bond)
 							  slave->dev->mpls_features,
 							  BOND_MPLS_FEATURES);
 
-		dst_release_flag &= slave->dev->priv_flags;
+		bond_dev->priv_flags &= slave->dev->priv_flags |
+					~IFF_XMIT_DST_RELEASE;
+
 		if (slave->dev->hard_header_len > max_hard_header_len)
 			max_hard_header_len = slave->dev->hard_header_len;
 
@@ -1180,11 +1181,6 @@ static void bond_compute_features(struct bonding *bond)
 	bond_dev->gso_max_segs = gso_max_segs;
 	netif_set_gso_max_size(bond_dev, gso_max_size);
 
-	bond_dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
-	if ((bond_dev->priv_flags & IFF_XMIT_DST_RELEASE_PERM) &&
-	    dst_release_flag == (IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM))
-		bond_dev->priv_flags |= IFF_XMIT_DST_RELEASE;
-
 	netdev_change_features(bond_dev);
 }
 
diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index b16a1221d19b..436945e0a4c1 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -382,11 +382,11 @@ static void net_failover_compute_features(struct net_device *dev)
 					  NETIF_F_ALL_FOR_ALL;
 	netdev_features_t enc_features  = FAILOVER_ENC_FEATURES;
 	unsigned short max_hard_header_len = ETH_HLEN;
-	unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |
-					IFF_XMIT_DST_RELEASE_PERM;
 	struct net_failover_info *nfo_info = netdev_priv(dev);
 	struct net_device *primary_dev, *standby_dev;
 
+	dev->priv_flags |= IFF_XMIT_DST_RELEASE;
+
 	primary_dev = rcu_dereference(nfo_info->primary_dev);
 	if (primary_dev) {
 		vlan_features =
@@ -398,7 +398,9 @@ static void net_failover_compute_features(struct net_device *dev)
 						  primary_dev->hw_enc_features,
 						  FAILOVER_ENC_FEATURES);
 
-		dst_release_flag &= primary_dev->priv_flags;
+		dev->priv_flags &= primary_dev->priv_flags |
+				   ~IFF_XMIT_DST_RELEASE;
+
 		if (primary_dev->hard_header_len > max_hard_header_len)
 			max_hard_header_len = primary_dev->hard_header_len;
 	}
@@ -414,7 +416,9 @@ static void net_failover_compute_features(struct net_device *dev)
 						  standby_dev->hw_enc_features,
 						  FAILOVER_ENC_FEATURES);
 
-		dst_release_flag &= standby_dev->priv_flags;
+		dev->priv_flags &= standby_dev->priv_flags |
+				   ~IFF_XMIT_DST_RELEASE;
+
 		if (standby_dev->hard_header_len > max_hard_header_len)
 			max_hard_header_len = standby_dev->hard_header_len;
 	}
@@ -423,10 +427,6 @@ static void net_failover_compute_features(struct net_device *dev)
 	dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
 	dev->hard_header_len = max_hard_header_len;
 
-	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
-	if (dst_release_flag == (IFF_XMIT_DST_RELEASE |
-				 IFF_XMIT_DST_RELEASE_PERM))
-		dev->priv_flags |= IFF_XMIT_DST_RELEASE;
 
 	netdev_change_features(dev);
 }
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 8c1e02752ff6..5987fc2db031 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -988,8 +988,8 @@ static void __team_compute_features(struct team *team)
 					  NETIF_F_ALL_FOR_ALL;
 	netdev_features_t enc_features  = TEAM_ENC_FEATURES;
 	unsigned short max_hard_header_len = ETH_HLEN;
-	unsigned int dst_release_flag = IFF_XMIT_DST_RELEASE |
-					IFF_XMIT_DST_RELEASE_PERM;
+
+	team->dev->priv_flags |= IFF_XMIT_DST_RELEASE;
 
 	list_for_each_entry(port, &team->port_list, list) {
 		vlan_features = netdev_increment_features(vlan_features,
@@ -1000,8 +1000,9 @@ static void __team_compute_features(struct team *team)
 						  port->dev->hw_enc_features,
 						  TEAM_ENC_FEATURES);
 
+		team->dev->priv_flags &= port->dev->priv_flags |
+					 ~IFF_XMIT_DST_RELEASE;
 
-		dst_release_flag &= port->dev->priv_flags;
 		if (port->dev->hard_header_len > max_hard_header_len)
 			max_hard_header_len = port->dev->hard_header_len;
 	}
@@ -1012,10 +1013,6 @@ static void __team_compute_features(struct team *team)
 				     NETIF_F_HW_VLAN_STAG_TX |
 				     NETIF_F_GSO_UDP_L4;
 	team->dev->hard_header_len = max_hard_header_len;
-
-	team->dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
-	if (dst_release_flag == (IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM))
-		team->dev->priv_flags |= IFF_XMIT_DST_RELEASE;
 }
 
 static void team_compute_features(struct team *team)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1a96e9c4ec36..fe7705eaad5a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1516,8 +1516,6 @@ struct net_device_ops {
  * @IFF_LIVE_ADDR_CHANGE: device supports hardware address
  *	change when it's running
  * @IFF_MACVLAN: Macvlan device
- * @IFF_XMIT_DST_RELEASE_PERM: IFF_XMIT_DST_RELEASE not taking into account
- *	underlying stacked devices
  * @IFF_L3MDEV_MASTER: device is an L3 master device
  * @IFF_NO_QUEUE: device can run without qdisc attached
  * @IFF_OPENVSWITCH: device is a Open vSwitch master
@@ -1551,7 +1549,6 @@ enum netdev_priv_flags {
 	IFF_SUPP_NOFCS			= 1<<14,
 	IFF_LIVE_ADDR_CHANGE		= 1<<15,
 	IFF_MACVLAN			= 1<<16,
-	IFF_XMIT_DST_RELEASE_PERM	= 1<<17,
 	IFF_L3MDEV_MASTER		= 1<<18,
 	IFF_NO_QUEUE			= 1<<19,
 	IFF_OPENVSWITCH			= 1<<20,
@@ -1584,7 +1581,6 @@ enum netdev_priv_flags {
 #define IFF_SUPP_NOFCS			IFF_SUPP_NOFCS
 #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
 #define IFF_MACVLAN			IFF_MACVLAN
-#define IFF_XMIT_DST_RELEASE_PERM	IFF_XMIT_DST_RELEASE_PERM
 #define IFF_L3MDEV_MASTER		IFF_L3MDEV_MASTER
 #define IFF_NO_QUEUE			IFF_NO_QUEUE
 #define IFF_OPENVSWITCH			IFF_OPENVSWITCH
@@ -4853,7 +4849,7 @@ static inline bool netif_is_failover_slave(const struct net_device *dev)
 /* This device needs to keep skb dst for qdisc enqueue or ndo_start_xmit() */
 static inline void netif_keep_dst(struct net_device *dev)
 {
-	dev->priv_flags &= ~(IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM);
+	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
 }
 
 /* return true if dev can't cope with mtu frames that need vlan tag insertion */
diff --git a/net/core/dev.c b/net/core/dev.c
index ae37586f6ee8..fb663e2f60eb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -9944,7 +9944,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 #ifdef CONFIG_NET_SCHED
 	hash_init(dev->qdisc_hash);
 #endif
-	dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
+	dev->priv_flags = IFF_XMIT_DST_RELEASE;
 	setup(dev);
 
 	if (!dev->tx_queue_len) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 03/11] net: vlan: add IFF_NO_VLAN_ROOM to constrain MTU
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 01/11] net: geneve: enable vlan offloads Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 02/11] net: do away with the IFF_XMIT_DST_RELEASE_PERM flag Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 04/11] net: geneve: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM Edwin Peer
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

Normally MTU does not account for an outer VLAN tag, since MTU is an
L3 constraint. Some Ethernet devices, however, have a size constrained
L2 that cannot expand to accommodate a VLAN tag.

The MACsec virtual device is an existing example that limits the MTU of
upper VLAN devices via netif_reduces_vlan_mtu(), but there are other
devices that should do so too. For example, virtual tunnel devices that
provide L2 overlays inside L3 networks where the inner L2 headers
contribute towards the outer L3 size.

Generalize netif_reduces_vlan_mtu() using a new device private flag to
indicate that the lower device does not have sufficient room to
accommodate a VLAN tag and convert MACsec to use the new flag. Also
apply this concept to the VLAN virtual device itself, since physical
devices do not generally allocate L2 room for nested VLANs.

Note that if the lower device is manually configured to a smaller MTU
value than the maximum it supports, then there is sufficient room and
IFF_NO_VLAN_ROOM should not be set.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 drivers/net/macsec.c      |  6 ++++--
 include/linux/if_vlan.h   | 40 +++++++++++++++++++++++++++++++++++++++
 include/linux/netdevice.h |  6 ++++--
 net/8021q/vlan_dev.c      |  9 +++++++++
 net/8021q/vlan_netlink.c  |  2 ++
 5 files changed, 59 insertions(+), 4 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index 20b53e255f68..4d02a953803a 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -3633,11 +3633,13 @@ static int macsec_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct macsec_dev *macsec = macsec_priv(dev);
 	unsigned int extra = macsec->secy.icv_len + macsec_extra_len(true);
+	unsigned int max_mtu = macsec->real_dev->mtu - extra;
 
-	if (macsec->real_dev->mtu - extra < new_mtu)
+	if (new_mtu > max_mtu)
 		return -ERANGE;
 
 	dev->mtu = new_mtu;
+	__vlan_constrain_mtu(dev, max_mtu);
 
 	return 0;
 }
@@ -4018,7 +4020,7 @@ static int macsec_newlink(struct net *net, struct net_device *dev,
 	if (real_dev->type != ARPHRD_ETHER)
 		return -EINVAL;
 
-	dev->priv_flags |= IFF_MACSEC;
+	dev->priv_flags |= IFF_MACSEC | IFF_NO_VLAN_ROOM;
 
 	macsec->real_dev = real_dev;
 
diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index b05e855f1ddd..e4a5532fb179 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -749,4 +749,44 @@ static inline unsigned long compare_vlan_header(const struct vlan_hdr *h1,
 		(__force u32)h2->h_vlan_encapsulated_proto);
 #endif
 }
+
+/* max_mtu is not necessarily the same as dev->max_mtu */
+static inline void __vlan_constrain_mtu(struct net_device *dev, int max_mtu)
+{
+	if (dev->mtu > max_mtu - VLAN_HLEN)
+		dev->priv_flags |= IFF_NO_VLAN_ROOM;
+	else
+		dev->priv_flags &= ~IFF_NO_VLAN_ROOM;
+}
+
+/**
+ * vlan_constrain_mtu - reduce MTU for upper VLAN devices if there's no L2 room
+ * @dev: a lower device having a VLAN constrained L2
+ *
+ * Sets IFF_NO_VLAN_ROOM based on the device's current and max MTU.
+ *
+ * Normally MTU does not account for an outer VLAN tag, since MTU is an L3
+ * constraint. Thus, this should only be called by devices that cannot expand
+ * L2 to accommodate one. For example, in order to support VLANs without IP
+ * fragmentation inside various tunnel encapsulations where the inner L2 size
+ * contributes towards the outer L3 size.
+ *
+ * This can also be useful for supporting VLANs, using a reduced MTU, on
+ * hardware which is VLAN challenged.
+ */
+static inline void vlan_constrain_mtu(struct net_device *dev)
+{
+	__vlan_constrain_mtu(dev, dev->max_mtu);
+}
+
+/**
+ * vlan_constrained_change_mtu - ndo_change_mtu for VLAN challenged devices
+ * @dev: the device to update
+ * @new_mtu: the new MTU
+ *
+ * This handler updates MTU and maintains the IFF_NO_VLAN_ROOM flag based on
+ * the newly requested MTU and the maximum supported by the device.
+ */
+int vlan_constrained_change_mtu(struct net_device *dev, int new_mtu);
+
 #endif /* !(_LINUX_IF_VLAN_H_) */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fe7705eaad5a..4d2ccdb9c57e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1516,6 +1516,7 @@ struct net_device_ops {
  * @IFF_LIVE_ADDR_CHANGE: device supports hardware address
  *	change when it's running
  * @IFF_MACVLAN: Macvlan device
+ * @IFF_NO_VLAN_ROOM: space constrained L2, upper VLAN devices must reduce MTU
  * @IFF_L3MDEV_MASTER: device is an L3 master device
  * @IFF_NO_QUEUE: device can run without qdisc attached
  * @IFF_OPENVSWITCH: device is a Open vSwitch master
@@ -1549,6 +1550,7 @@ enum netdev_priv_flags {
 	IFF_SUPP_NOFCS			= 1<<14,
 	IFF_LIVE_ADDR_CHANGE		= 1<<15,
 	IFF_MACVLAN			= 1<<16,
+	IFF_NO_VLAN_ROOM		= 1<<17,
 	IFF_L3MDEV_MASTER		= 1<<18,
 	IFF_NO_QUEUE			= 1<<19,
 	IFF_OPENVSWITCH			= 1<<20,
@@ -1581,6 +1583,7 @@ enum netdev_priv_flags {
 #define IFF_SUPP_NOFCS			IFF_SUPP_NOFCS
 #define IFF_LIVE_ADDR_CHANGE		IFF_LIVE_ADDR_CHANGE
 #define IFF_MACVLAN			IFF_MACVLAN
+#define IFF_NO_VLAN_ROOM		IFF_NO_VLAN_ROOM
 #define IFF_L3MDEV_MASTER		IFF_L3MDEV_MASTER
 #define IFF_NO_QUEUE			IFF_NO_QUEUE
 #define IFF_OPENVSWITCH			IFF_OPENVSWITCH
@@ -4855,8 +4858,7 @@ static inline void netif_keep_dst(struct net_device *dev)
 /* return true if dev can't cope with mtu frames that need vlan tag insertion */
 static inline bool netif_reduces_vlan_mtu(struct net_device *dev)
 {
-	/* TODO: reserve and use an additional IFF bit, if we get more users */
-	return dev->priv_flags & IFF_MACSEC;
+	return dev->priv_flags & IFF_NO_VLAN_ROOM;
 }
 
 extern struct pernet_operations __net_initdata loopback_net_ops;
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index f00bb57f0f60..67354b4ebcdb 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -149,10 +149,19 @@ static int vlan_dev_change_mtu(struct net_device *dev, int new_mtu)
 		return -ERANGE;
 
 	dev->mtu = new_mtu;
+	__vlan_constrain_mtu(dev, max_mtu);
 
 	return 0;
 }
 
+int vlan_constrained_change_mtu(struct net_device *dev, int new_mtu)
+{
+	dev->mtu = new_mtu;
+	vlan_constrain_mtu(dev);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(vlan_constrained_change_mtu);
+
 void vlan_dev_set_ingress_priority(const struct net_device *dev,
 				   u32 skb_prio, u16 vlan_prio)
 {
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
index 0db85aeb119b..c7aea2488f46 100644
--- a/net/8021q/vlan_netlink.c
+++ b/net/8021q/vlan_netlink.c
@@ -181,6 +181,8 @@ static int vlan_newlink(struct net *src_net, struct net_device *dev,
 		dev->mtu = max_mtu;
 	else if (dev->mtu > max_mtu)
 		return -EINVAL;
+	else if (dev->mtu > max_mtu - VLAN_HLEN)
+		dev->priv_flags |= IFF_NO_VLAN_ROOM;
 
 	err = vlan_changelink(dev, tb, data, extack);
 	if (!err)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 04/11] net: geneve: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (2 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 03/11] net: vlan: add IFF_NO_VLAN_ROOM to constrain MTU Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 05/11] net: vxlan: " Edwin Peer
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

Constrain the MTU of upper VLAN devices if the MTU of the Geneve device
is configured to its default optimal size, which does not leave space
for a nested VLAN tag without causing fragmentation. If the underlying
best MTU is not known, then the worst case is assumed and any upper VLAN
devices will always be adjusted to accommodate the VLAN tag.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 drivers/net/geneve.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index 8d790cf85b21..9c8e6f242f77 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -67,6 +67,7 @@ struct geneve_dev {
 	bool		   use_udp6_rx_checksums;
 	bool		   ttl_inherit;
 	enum ifla_geneve_df df;
+	int		   best_mtu;
 };
 
 struct geneve_sock {
@@ -1020,12 +1021,15 @@ static netdev_tx_t geneve_xmit(struct sk_buff *skb, struct net_device *dev)
 
 static int geneve_change_mtu(struct net_device *dev, int new_mtu)
 {
+	struct geneve_dev *geneve = netdev_priv(dev);
+
 	if (new_mtu > dev->max_mtu)
 		new_mtu = dev->max_mtu;
 	else if (new_mtu < dev->min_mtu)
 		new_mtu = dev->min_mtu;
 
 	dev->mtu = new_mtu;
+	__vlan_constrain_mtu(dev, geneve->best_mtu);
 	return 0;
 }
 
@@ -1497,7 +1501,6 @@ static void geneve_link_config(struct net_device *dev,
 			       struct ip_tunnel_info *info, struct nlattr *tb[])
 {
 	struct geneve_dev *geneve = netdev_priv(dev);
-	int ldev_mtu = 0;
 
 	if (tb[IFLA_MTU]) {
 		geneve_change_mtu(dev, nla_get_u32(tb[IFLA_MTU]));
@@ -1510,7 +1513,7 @@ static void geneve_link_config(struct net_device *dev,
 		struct rtable *rt = ip_route_output_key(geneve->net, &fl4);
 
 		if (!IS_ERR(rt) && rt->dst.dev) {
-			ldev_mtu = rt->dst.dev->mtu - GENEVE_IPV4_HLEN;
+			geneve->best_mtu = rt->dst.dev->mtu - GENEVE_IPV4_HLEN;
 			ip_rt_put(rt);
 		}
 		break;
@@ -1526,17 +1529,19 @@ static void geneve_link_config(struct net_device *dev,
 				NULL, 0);
 
 		if (rt && rt->dst.dev)
-			ldev_mtu = rt->dst.dev->mtu - GENEVE_IPV6_HLEN;
+			geneve->best_mtu = rt->dst.dev->mtu - GENEVE_IPV6_HLEN;
 		ip6_rt_put(rt);
 		break;
 	}
 #endif
 	}
 
-	if (ldev_mtu <= 0)
+	geneve->best_mtu -= info->options_len;
+
+	if (geneve->best_mtu <= 0)
 		return;
 
-	geneve_change_mtu(dev, ldev_mtu - info->options_len);
+	geneve_change_mtu(dev, geneve->best_mtu);
 }
 
 static int geneve_newlink(struct net *net, struct net_device *dev,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 05/11] net: vxlan: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (3 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 04/11] net: geneve: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 06/11] net: l2tp_eth: " Edwin Peer
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

Constrain the MTU of upper VLAN devices if the MTU of the VXLAN device
is configured to its default optimal size, which does not leave space
for a nested VLAN tag without causing fragmentation. If the underlying
lower device is not known, then the worst case is assumed and any upper
VLAN devices will always be adjusted to accommodate the VLAN tag.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 drivers/net/vxlan.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index a0015cdedfaf..3e9c65eb4737 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -3098,18 +3098,20 @@ static int vxlan_change_mtu(struct net_device *dev, int new_mtu)
 	struct net_device *lowerdev = __dev_get_by_index(vxlan->net,
 							 dst->remote_ifindex);
 	bool use_ipv6 = !!(vxlan->cfg.flags & VXLAN_F_IPV6);
+	unsigned int max_mtu = 0;
 
 	/* This check is different than dev->max_mtu, because it looks at
 	 * the lowerdev->mtu, rather than the static dev->max_mtu
 	 */
 	if (lowerdev) {
-		int max_mtu = lowerdev->mtu -
-			      (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);
+		max_mtu = lowerdev->mtu -
+			  (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);
 		if (new_mtu > max_mtu)
 			return -EINVAL;
 	}
 
 	dev->mtu = new_mtu;
+	__vlan_constrain_mtu(dev, max_mtu);
 	return 0;
 }
 
@@ -3241,7 +3243,7 @@ static void vxlan_setup(struct net_device *dev)
 	dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
 	dev->hw_features |= NETIF_F_GSO_SOFTWARE;
 	netif_keep_dst(dev);
-	dev->priv_flags |= IFF_NO_QUEUE;
+	dev->priv_flags |= IFF_NO_QUEUE | IFF_NO_VLAN_ROOM;
 
 	/* MTU range: 68 - 65535 */
 	dev->min_mtu = ETH_MIN_MTU;
@@ -3762,6 +3764,8 @@ static void vxlan_config_apply(struct net_device *dev,
 	if (dev->mtu > max_mtu)
 		dev->mtu = max_mtu;
 
+	__vlan_constrain_mtu(dev, max_mtu);
+
 	if (use_ipv6 || conf->flags & VXLAN_F_COLLECT_METADATA)
 		needed_headroom += VXLAN6_HEADROOM;
 	else
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 06/11] net: l2tp_eth: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (4 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 05/11] net: vxlan: " Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 07/11] net: gre: " Edwin Peer
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

Constrain the MTU of upper VLAN devices if the MTU of the L2TP Ethernet
device is configured to its default optimal size, which does not leave
space for a nested VLAN tag without causing fragmentation.

Refactor l2tp_eth_adjust_mtu() so that it can also be used to determine
the optimal size when the L2TP device's MTU is changed. This function
needed to move before the net_device_ops definition in order to avoid a
forward declaration, but instead the definition of net_device_ops is
moved so that the refactoring changes are better represented in the
diff.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 net/l2tp/l2tp_eth.c | 114 ++++++++++++++++++++++++--------------------
 1 file changed, 63 insertions(+), 51 deletions(-)

diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index fd5ac2788e45..6fbb900bc3d1 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -103,28 +103,6 @@ static void l2tp_eth_get_stats64(struct net_device *dev,
 
 }
 
-static const struct net_device_ops l2tp_eth_netdev_ops = {
-	.ndo_init		= l2tp_eth_dev_init,
-	.ndo_uninit		= l2tp_eth_dev_uninit,
-	.ndo_start_xmit		= l2tp_eth_dev_xmit,
-	.ndo_get_stats64	= l2tp_eth_get_stats64,
-	.ndo_set_mac_address	= eth_mac_addr,
-};
-
-static struct device_type l2tpeth_type = {
-	.name = "l2tpeth",
-};
-
-static void l2tp_eth_dev_setup(struct net_device *dev)
-{
-	SET_NETDEV_DEVTYPE(dev, &l2tpeth_type);
-	ether_setup(dev);
-	dev->priv_flags		&= ~IFF_TX_SKB_SHARING;
-	dev->features		|= NETIF_F_LLTX;
-	dev->netdev_ops		= &l2tp_eth_netdev_ops;
-	dev->needs_free_netdev	= true;
-}
-
 static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb, int data_len)
 {
 	struct l2tp_eth_sess *spriv = l2tp_session_priv(session);
@@ -215,44 +193,73 @@ static void l2tp_eth_show(struct seq_file *m, void *arg)
 	dev_put(dev);
 }
 
-static void l2tp_eth_adjust_mtu(struct l2tp_tunnel *tunnel,
-				struct l2tp_session *session,
-				struct net_device *dev)
+static unsigned int l2tp_eth_best_mtu(struct net_device *dev)
 {
-	unsigned int overhead = 0;
-	u32 l3_overhead = 0;
-	u32 mtu;
+	struct l2tp_eth *priv = netdev_priv(dev);
+	struct l2tp_session *session = priv->session;
+	struct l2tp_tunnel *tunnel = session->tunnel;
+	unsigned int mtu, overhead = 0;
 
-	/* if the encap is UDP, account for UDP header size */
-	if (tunnel->encap == L2TP_ENCAPTYPE_UDP) {
-		overhead += sizeof(struct udphdr);
-		dev->needed_headroom += sizeof(struct udphdr);
+	if (tunnel->sock) {
+		lock_sock(tunnel->sock);
+		overhead = kernel_sock_ip_overhead(tunnel->sock);
+		release_sock(tunnel->sock);
 	}
 
-	lock_sock(tunnel->sock);
-	l3_overhead = kernel_sock_ip_overhead(tunnel->sock);
-	release_sock(tunnel->sock);
-
-	if (l3_overhead == 0) {
+	if (overhead == 0) {
 		/* L3 Overhead couldn't be identified, this could be
 		 * because tunnel->sock was NULL or the socket's
 		 * address family was not IPv4 or IPv6,
-		 * dev mtu stays at 1500.
+		 * assume existing MTU is best
 		 */
-		return;
+		return dev->mtu;
 	}
-	/* Adjust MTU, factor overhead - underlay L3, overlay L2 hdr
-	 * UDP overhead, if any, was already factored in above.
-	 */
-	overhead += session->hdr_len + ETH_HLEN + l3_overhead;
 
+	/* if the encap is UDP, account for UDP header size */
+	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+		overhead += sizeof(struct udphdr);
+
+	/* Maximize MTU, factor in overhead - overlay L2 and Geneve header.
+	 * UDP overhead, if any, and underlay L3 already factored in above.
+	 */
+	overhead += session->hdr_len + ETH_HLEN;
 	mtu = l2tp_tunnel_dst_mtu(tunnel) - overhead;
 	if (mtu < dev->min_mtu || mtu > dev->max_mtu)
-		dev->mtu = ETH_DATA_LEN - overhead;
-	else
-		dev->mtu = mtu;
+		mtu = ETH_DATA_LEN - overhead;
 
-	dev->needed_headroom += session->hdr_len;
+	return mtu;
+}
+
+static int l2tp_eth_change_mtu(struct net_device *dev, int new_mtu)
+{
+	unsigned int best_mtu = l2tp_eth_best_mtu(dev);
+
+	dev->mtu = new_mtu;
+	__vlan_constrain_mtu(dev, best_mtu);
+	return 0;
+}
+
+static const struct net_device_ops l2tp_eth_netdev_ops = {
+	.ndo_init		= l2tp_eth_dev_init,
+	.ndo_uninit		= l2tp_eth_dev_uninit,
+	.ndo_start_xmit		= l2tp_eth_dev_xmit,
+	.ndo_get_stats64	= l2tp_eth_get_stats64,
+	.ndo_set_mac_address	= eth_mac_addr,
+	.ndo_change_mtu		= l2tp_eth_change_mtu,
+};
+
+static struct device_type l2tpeth_type = {
+	.name = "l2tpeth",
+};
+
+static void l2tp_eth_dev_setup(struct net_device *dev)
+{
+	SET_NETDEV_DEVTYPE(dev, &l2tpeth_type);
+	ether_setup(dev);
+	dev->priv_flags		&= ~IFF_TX_SKB_SHARING;
+	dev->features		|= NETIF_F_LLTX;
+	dev->netdev_ops		= &l2tp_eth_netdev_ops;
+	dev->needs_free_netdev	= true;
 }
 
 static int l2tp_eth_create(struct net *net, struct l2tp_tunnel *tunnel,
@@ -289,14 +296,19 @@ static int l2tp_eth_create(struct net *net, struct l2tp_tunnel *tunnel,
 		goto err_sess;
 	}
 
-	dev_net_set(dev, net);
-	dev->min_mtu = 0;
-	dev->max_mtu = ETH_MAX_MTU;
-	l2tp_eth_adjust_mtu(tunnel, session, dev);
+	if (tunnel->encap == L2TP_ENCAPTYPE_UDP)
+		dev->needed_headroom += sizeof(struct udphdr);
+	dev->needed_headroom += session->hdr_len;
 
 	priv = netdev_priv(dev);
 	priv->session = session;
 
+	dev_net_set(dev, net);
+	dev->min_mtu = 0;
+	dev->max_mtu = ETH_MAX_MTU;
+	dev->mtu = l2tp_eth_best_mtu(dev);
+	dev->priv_flags |= IFF_NO_VLAN_ROOM;
+
 	session->recv_skb = l2tp_eth_dev_recv;
 	session->session_close = l2tp_eth_delete;
 	if (IS_ENABLED(CONFIG_L2TP_DEBUGFS))
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 07/11] net: gre: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (5 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 06/11] net: l2tp_eth: " Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 08/11] net: distribute IFF_NO_VLAN_ROOM into aggregate devs Edwin Peer
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

Constrain the MTU of upper VLAN devices if the MTU of the GRE device
is configured to its default optimal size, which does not leave space
for a nested VLAN tag without causing fragmentation. If the underlying
lower device is not known, then the worst case is assumed and any upper
VLAN devices will always be adjusted to accommodate the VLAN tag.

For IPv4 tunnels, the changes to support this are made in the generic
ip_tunnel_change_mtu() handler and so IFF_NO_VLAN_ROOM is consequently
maintained for all tunnel devices that leverage this implementation. GRE
is, however, the only one of these implementations that might use an L2
overlay. At present nothing prevents VLAN devices being layered above
raw IP tunnel devices, which does not make sense. This limitation will
be addressed by a later patch in this series.

IPv6 GRE is dependent on PMTU discovery, but the MTU of nested VLANs
still need to be constrained, because non-VLAN packets will share the
same path MTU.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 net/ipv4/ip_tunnel.c | 2 ++
 net/ipv6/ip6_gre.c   | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index f4f1d11eab50..21803bd35ab3 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -981,6 +981,7 @@ int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict)
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	int t_hlen = tunnel->hlen + sizeof(struct iphdr);
 	int max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
+	int best_mtu = ip_tunnel_bind_dev(dev);
 
 	if (new_mtu < ETH_MIN_MTU)
 		return -EINVAL;
@@ -993,6 +994,7 @@ int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict)
 	}
 
 	dev->mtu = new_mtu;
+	__vlan_constrain_mtu(dev, best_mtu);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(__ip_tunnel_change_mtu);
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 781ca8c07a0d..0b86ee7f3d31 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1131,6 +1131,8 @@ static void ip6gre_tnl_link_config_route(struct ip6_tnl *t, int set_mtu,
 
 				if (dev->mtu < IPV6_MIN_MTU)
 					dev->mtu = IPV6_MIN_MTU;
+
+				dev->priv_flags |= IFF_NO_VLAN_ROOM;
 			}
 		}
 		ip6_rt_put(rt);
@@ -1801,7 +1803,7 @@ static int ip6gre_tap_init(struct net_device *dev)
 	if (ret)
 		return ret;
 
-	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+	dev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_VLAN_ROOM;
 
 	return 0;
 }
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 08/11] net: distribute IFF_NO_VLAN_ROOM into aggregate devs
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (6 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 07/11] net: gre: " Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 09/11] net: ntb_netdev: support VLAN using IFF_NO_VLAN_ROOM Edwin Peer
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

VLANs layered above aggregate devices such as bonds and teams should
inherit the MTU limitations of the devices underlying the aggregate.
If any one of the slave devices is constrained by IFF_NO_VLAN_ROOM,
then so must the aggregate as a whole.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 drivers/net/bonding/bond_main.c |  3 +++
 drivers/net/net_failover.c      | 12 ++++++++++++
 drivers/net/team/team.c         |  5 +++++
 3 files changed, 20 insertions(+)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 4c45f1692934..3002ba879d14 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1147,6 +1147,7 @@ static void bond_compute_features(struct bonding *bond)
 	vlan_features &= NETIF_F_ALL_FOR_ALL;
 	mpls_features &= NETIF_F_ALL_FOR_ALL;
 	bond_dev->priv_flags |= IFF_XMIT_DST_RELEASE;
+	bond_dev->priv_flags &= ~IFF_NO_VLAN_ROOM;
 
 	bond_for_each_slave(bond, slave, iter) {
 		vlan_features = netdev_increment_features(vlan_features,
@@ -1162,6 +1163,8 @@ static void bond_compute_features(struct bonding *bond)
 
 		bond_dev->priv_flags &= slave->dev->priv_flags |
 					~IFF_XMIT_DST_RELEASE;
+		bond_dev->priv_flags |= slave->dev->priv_flags &
+					IFF_NO_VLAN_ROOM;
 
 		if (slave->dev->hard_header_len > max_hard_header_len)
 			max_hard_header_len = slave->dev->hard_header_len;
diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index 436945e0a4c1..a085d292b4cf 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -211,6 +211,16 @@ static void net_failover_get_stats(struct net_device *dev,
 	spin_unlock(&nfo_info->stats_lock);
 }
 
+static void net_failover_constrain_vlan(struct net_device *dev,
+					struct net_device *primary,
+					struct net_device *standby)
+{
+	dev->priv_flags &= ~IFF_NO_VLAN_ROOM;
+	dev->priv_flags |= IFF_NO_VLAN_ROOM &
+			   ((primary ? primary->priv_flags : 0) |
+			    (standby ? standby->priv_flags : 0));
+}
+
 static int net_failover_change_mtu(struct net_device *dev, int new_mtu)
 {
 	struct net_failover_info *nfo_info = netdev_priv(dev);
@@ -235,6 +245,7 @@ static int net_failover_change_mtu(struct net_device *dev, int new_mtu)
 	}
 
 	dev->mtu = new_mtu;
+	net_failover_constrain_vlan(dev, primary_dev, standby_dev);
 
 	return 0;
 }
@@ -427,6 +438,7 @@ static void net_failover_compute_features(struct net_device *dev)
 	dev->hw_enc_features = enc_features | NETIF_F_GSO_ENCAP_ALL;
 	dev->hard_header_len = max_hard_header_len;
 
+	net_failover_constrain_vlan(dev, primary_dev, standby_dev);
 
 	netdev_change_features(dev);
 }
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 5987fc2db031..3e18d1de036d 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -990,6 +990,7 @@ static void __team_compute_features(struct team *team)
 	unsigned short max_hard_header_len = ETH_HLEN;
 
 	team->dev->priv_flags |= IFF_XMIT_DST_RELEASE;
+	team->dev->priv_flags &= ~IFF_NO_VLAN_ROOM;
 
 	list_for_each_entry(port, &team->port_list, list) {
 		vlan_features = netdev_increment_features(vlan_features,
@@ -1002,6 +1003,8 @@ static void __team_compute_features(struct team *team)
 
 		team->dev->priv_flags &= port->dev->priv_flags |
 					 ~IFF_XMIT_DST_RELEASE;
+		team->dev->priv_flags |= port->dev->priv_flags &
+					 IFF_NO_VLAN_ROOM;
 
 		if (port->dev->hard_header_len > max_hard_header_len)
 			max_hard_header_len = port->dev->hard_header_len;
@@ -1808,6 +1811,7 @@ static int team_change_mtu(struct net_device *dev, int new_mtu)
 	 */
 	mutex_lock(&team->lock);
 	team->port_mtu_change_allowed = true;
+	dev->priv_flags &= ~IFF_NO_VLAN_ROOM;
 	list_for_each_entry(port, &team->port_list, list) {
 		err = dev_set_mtu(port->dev, new_mtu);
 		if (err) {
@@ -1815,6 +1819,7 @@ static int team_change_mtu(struct net_device *dev, int new_mtu)
 				   port->dev->name);
 			goto unwind;
 		}
+		dev->priv_flags |= port->dev->priv_flags & IFF_NO_VLAN_ROOM;
 	}
 	team->port_mtu_change_allowed = false;
 	mutex_unlock(&team->lock);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 09/11] net: ntb_netdev: support VLAN using IFF_NO_VLAN_ROOM
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (7 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 08/11] net: distribute IFF_NO_VLAN_ROOM into aggregate devs Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 10/11] net: vlan: disallow non-Ethernet devices Edwin Peer
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

It's not clear how useful VLANs layered over Ethernet emulated over NTB
transport are, but they are presently allowed. The L2 emulation exposed
by NTB does not account for possible VLAN tags in the space allocated
for Ethernet headers when configured for maximal MTU.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 drivers/net/ntb_netdev.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ntb_netdev.c b/drivers/net/ntb_netdev.c
index a5bab614ff84..839414afa175 100644
--- a/drivers/net/ntb_netdev.c
+++ b/drivers/net/ntb_netdev.c
@@ -52,6 +52,7 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 #include <linux/ntb.h>
+#include <linux/if_vlan.h>
 #include <linux/ntb_transport.h>
 
 #define NTB_NETDEV_VER	"0.7"
@@ -299,14 +300,16 @@ static int ntb_netdev_close(struct net_device *ndev)
 static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
 {
 	struct ntb_netdev *dev = netdev_priv(ndev);
+	unsigned int max_mtu = ntb_transport_max_size(dev->qp) - ETH_HLEN;
 	struct sk_buff *skb;
 	int len, rc;
 
-	if (new_mtu > ntb_transport_max_size(dev->qp) - ETH_HLEN)
+	if (new_mtu > max_mtu)
 		return -EINVAL;
 
 	if (!netif_running(ndev)) {
 		ndev->mtu = new_mtu;
+		__vlan_constrain_mtu(ndev, max_mtu);
 		return 0;
 	}
 
@@ -336,6 +339,7 @@ static int ntb_netdev_change_mtu(struct net_device *ndev, int new_mtu)
 	}
 
 	ndev->mtu = new_mtu;
+	__vlan_constrain_mtu(ndev, max_mtu);
 
 	ntb_transport_link_up(dev->qp);
 
@@ -422,7 +426,7 @@ static int ntb_netdev_probe(struct device *client_dev)
 	dev->pdev = pdev;
 	ndev->features = NETIF_F_HIGHDMA;
 
-	ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+	ndev->priv_flags |= IFF_LIVE_ADDR_CHANGE | IFF_NO_VLAN_ROOM;
 
 	ndev->hw_features = ndev->features;
 	ndev->watchdog_timeo = msecs_to_jiffies(NTB_TX_TIMEOUT_MS);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 10/11] net: vlan: disallow non-Ethernet devices
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (8 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 09/11] net: ntb_netdev: support VLAN using IFF_NO_VLAN_ROOM Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-27 21:25 ` [RFC PATCH net-next 11/11] net: leverage IFF_NO_VLAN_ROOM to limit NETIF_F_VLAN_CHALLENGED Edwin Peer
  2020-05-28  0:15 ` [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Michał Mirosław
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

Layering VLANs on top of non-Ethernet devices is not defined. This
prevents nonsensical combinations like VLANs on top of raw IP in
IP tunnels, for example.

Checking the device type means that some devices no longer need to
explicitly set NETIF_F_VLAN_CHALLENGED. Remove this flags from those
devices where this is trivially the case.

The failover device does not explicitly check device types when adding
slaves (team and bonding drivers do), so include the device type check
when VLANs are in use too.

TEQL scheduling devices have historically allowed VLANs and this change
would break that. Inherit the slave device's type instead of ARPHRD_VOID
and enforce that all slaves have the same type.

IFB devices do not have a normal xmit function and so VLAN devices
cannot forward frames (packets must be redirected). IFB devices are
also IFF_NOARP, so we can set the device type accordingly to disable
support for VLANs. Similarly the SB1000 is RX only.

VLANs don't make sense for VRF devices because they are raw IP. Relying
on NETIF_F_VLAN_CHALLENGED is unnecessary if the device type is set
appropriately.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c   | 3 +--
 drivers/net/ethernet/xilinx/ll_temac_main.c | 1 -
 drivers/net/ifb.c                           | 4 ++++
 drivers/net/loopback.c                      | 1 -
 drivers/net/net_failover.c                  | 3 +--
 drivers/net/sb1000.c                        | 1 +
 drivers/net/vrf.c                           | 4 +---
 include/linux/if_vlan.h                     | 8 ++++++++
 net/8021q/vlan.c                            | 2 +-
 net/sched/sch_teql.c                        | 3 +++
 10 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 81b8227214f1..a4404957b39b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -2086,8 +2086,7 @@ void ipoib_setup_common(struct net_device *dev)
 	dev->addr_len		 = INFINIBAND_ALEN;
 	dev->type		 = ARPHRD_INFINIBAND;
 	dev->tx_queue_len	 = ipoib_sendq_size * 2;
-	dev->features		 = (NETIF_F_VLAN_CHALLENGED	|
-				    NETIF_F_HIGHDMA);
+	dev->features		 = NETIF_F_HIGHDMA;
 	netif_keep_dst(dev);
 
 	memcpy(dev->broadcast, ipv4_bcast_addr, INFINIBAND_ALEN);
diff --git a/drivers/net/ethernet/xilinx/ll_temac_main.c b/drivers/net/ethernet/xilinx/ll_temac_main.c
index 929244064abd..731326d1e99f 100644
--- a/drivers/net/ethernet/xilinx/ll_temac_main.c
+++ b/drivers/net/ethernet/xilinx/ll_temac_main.c
@@ -1375,7 +1375,6 @@ static int temac_probe(struct platform_device *pdev)
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_TX; /* Transmit VLAN hw accel */
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_RX; /* Receive VLAN hw acceleration */
 	ndev->features |= NETIF_F_HW_VLAN_CTAG_FILTER; /* Receive VLAN filtering */
-	ndev->features |= NETIF_F_VLAN_CHALLENGED; /* cannot handle VLAN pkts */
 	ndev->features |= NETIF_F_GSO; /* Enable software GSO. */
 	ndev->features |= NETIF_F_MULTI_QUEUE; /* Has multiple TX/RX queues */
 	ndev->features |= NETIF_F_LRO; /* large receive offload */
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 7fe306e76281..3ac6b4282113 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -213,6 +213,10 @@ static void ifb_setup(struct net_device *dev)
 
 	/* Fill in device structure with ethernet-generic values. */
 	ether_setup(dev);
+
+	/* Override ARPHRD_ETHER to disallow upper VLAN devices (no xmit). */
+	dev->type = ARPHRD_NONE;
+
 	dev->tx_queue_len = TX_Q_LIMIT;
 
 	dev->features |= IFB_FEATURES;
diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index a1c77cc00416..59c7aa726245 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -183,7 +183,6 @@ static void gen_lo_setup(struct net_device *dev,
 		| NETIF_F_HIGHDMA
 		| NETIF_F_LLTX
 		| NETIF_F_NETNS_LOCAL
-		| NETIF_F_VLAN_CHALLENGED
 		| NETIF_F_LOOPBACK;
 	dev->ethtool_ops	= eth_ops;
 	dev->header_ops		= hdr_ops;
diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index a085d292b4cf..78ef32642c85 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -496,8 +496,7 @@ static int net_failover_slave_pre_register(struct net_device *slave_dev,
 				  !dev_is_pci(slave_dev->dev.parent)))
 		return -EINVAL;
 
-	if (failover_dev->features & NETIF_F_VLAN_CHALLENGED &&
-	    vlan_uses_dev(failover_dev)) {
+	if (vlan_challenged(failover_dev) && vlan_uses_dev(failover_dev)) {
 		netdev_err(failover_dev, "Device %s is VLAN challenged and failover device has VLAN set up\n",
 			   failover_dev->name);
 		return -EINVAL;
diff --git a/drivers/net/sb1000.c b/drivers/net/sb1000.c
index e88af978f63c..0cb3fb563160 100644
--- a/drivers/net/sb1000.c
+++ b/drivers/net/sb1000.c
@@ -192,6 +192,7 @@ sb1000_probe_one(struct pnp_dev *pdev, const struct pnp_device_id *id)
 	 * The SB1000 is an rx-only cable modem device.  The uplink is a modem
 	 * and we do not want to arp on it.
 	 */
+	dev->type = ARPHRD_NONE;
 	dev->flags = IFF_POINTOPOINT|IFF_NOARP;
 
 	SET_NETDEV_DEV(dev, &pdev->dev);
diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 43928a1c2f2a..f490f3cfa358 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -860,6 +860,7 @@ static int vrf_dev_init(struct net_device *dev)
 	if (vrf_rt6_create(dev) != 0)
 		goto out_rth;
 
+	dev->type = ARPHRD_RAWIP;
 	dev->flags = IFF_MASTER | IFF_NOARP;
 
 	/* MTU is irrelevant for VRF device; set to 64k similar to lo */
@@ -1271,9 +1272,6 @@ static void vrf_setup(struct net_device *dev)
 	/* don't allow vrf devices to change network namespaces. */
 	dev->features |= NETIF_F_NETNS_LOCAL;
 
-	/* does not make sense for a VLAN to be added to a vrf device */
-	dev->features   |= NETIF_F_VLAN_CHALLENGED;
-
 	/* enable offload features */
 	dev->features   |= NETIF_F_GSO_SOFTWARE;
 	dev->features   |= NETIF_F_RXCSUM | NETIF_F_HW_CSUM | NETIF_F_SCTP_CRC;
diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index e4a5532fb179..7c2781ec7013 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -12,6 +12,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/bug.h>
 #include <uapi/linux/if_vlan.h>
+#include <uapi/linux/if_arp.h>
 
 #define VLAN_HLEN	4		/* The additional bytes required by VLAN
 					 * (in addition to the Ethernet header)
@@ -789,4 +790,11 @@ static inline void vlan_constrain_mtu(struct net_device *dev)
  */
 int vlan_constrained_change_mtu(struct net_device *dev, int new_mtu);
 
+/* in addition to NETIF_F_VLAN_CHALLENGED, non-Ethernet devices can't VLAN */
+static inline bool vlan_challenged(struct net_device *dev)
+{
+	return dev->type != ARPHRD_ETHER ||
+	       dev->features & NETIF_F_VLAN_CHALLENGED;
+}
+
 #endif /* !(_LINUX_IF_VLAN_H_) */
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index d4bcfd8f95bf..eb162b020ec9 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -127,7 +127,7 @@ int vlan_check_real_dev(struct net_device *real_dev,
 {
 	const char *name = real_dev->name;
 
-	if (real_dev->features & NETIF_F_VLAN_CHALLENGED) {
+	if (vlan_challenged(real_dev)) {
 		pr_info("VLANs not supported on %s\n", name);
 		NL_SET_ERR_MSG_MOD(extack, "VLANs not supported on device");
 		return -EOPNOTSUPP;
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index 689ef6f3ded8..7ef08bd16b24 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -181,6 +181,8 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
 	skb_queue_head_init(&q->q);
 
 	if (m->slaves) {
+		if (m->dev->type != dev->type)
+			return -EINVAL;
 		if (m->dev->flags & IFF_UP) {
 			if ((m->dev->flags & IFF_POINTOPOINT &&
 			     !(dev->flags & IFF_POINTOPOINT)) ||
@@ -205,6 +207,7 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
 	} else {
 		q->next = sch;
 		m->slaves = sch;
+		m->dev->type = dev->type;
 		m->dev->mtu = dev->mtu;
 		m->dev->flags = (m->dev->flags&~FMASK)|(dev->flags&FMASK);
 	}
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH net-next 11/11] net: leverage IFF_NO_VLAN_ROOM to limit NETIF_F_VLAN_CHALLENGED
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (9 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 10/11] net: vlan: disallow non-Ethernet devices Edwin Peer
@ 2020-05-27 21:25 ` Edwin Peer
  2020-05-28  0:15 ` [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Michał Mirosław
  11 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-27 21:25 UTC (permalink / raw)
  To: netdev
  Cc: Edwin Peer, edumazet, linville, shemminger, mirq-linux,
	jesse.brandeburg, jchapman, david, j.vosburgh, vfalico, andy,
	sridhar.samudrala, jiri, geoff, mokuno, msink, mporter,
	inaky.perez-gonzalez, jwi, kgraul, ubraun, grant.likely, hadi,
	dsahern, shrijeet, jon.mason, dave.jiang, saeedm, hadarh,
	ogerlitz, allenbh, michael.chan

This addresses an old FIXME in netdev-features.txt. The new
IFF_NO_VLAN_ROOM infrastructure makes supporting VLANs on most of these
limited devices trivial. VLANs are now supported on these devices if
either:

a) the underlying device's MTU is configured VLAN_HLEN smaller than the
maximum supported MTU, or

b) by the upper VLAN automatically configuring a reduced MTU if the
configured device MTU is too large.

The Mellanox switch driver required a little more refactoring than most
to access the maximum supported MTU from ndo_change_mtu.

The Intel WiMAX 2400M is natively a raw IP device, but the driver
emulates Ethernet headers to present a fake Ethernet device. All these
contortions are probably entirely unnecessary, but absent hardware to
test, changing this aspect seems risky.

The ps3_gelic_net device is also interesting. An outer VLAN is
sometimes used internally to distinguish the output device, but there
doesn't appear to be any reason nested VLAN tags cannot be supported.
Furthermore, with an advertised maximum MTU of 1518, there should be
sufficient room for an additional tag too.

The remaining VLAN challenged holdouts are:
	net/hsr/hsr_device.c
	drivers/net/ipvlan/ipvlan_main.c
	drivers/net/ethernet/mellanox/mlx5/core/en_rep.c

The HSR device could in theory support VLANs, but this would appear to
be a nontrivial exercise. It's not clear why mlx5 representors don't
handle VLANs, but other VF representors do appear to, so perhaps this
could be revised too. That would leave only ipvlan, which unfortantely
does not lend itself to removing NET_F_VLAN_CHALLENGED entirely, since
it has a mode exposing an L2 Ethernet device for which VLANs do not
have any sensible meaning.

Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
---
 Documentation/networking/netdev-features.rst  |  4 +-
 drivers/net/ethernet/intel/e100.c             | 15 +++++-
 .../net/ethernet/mellanox/mlxsw/switchx2.c    | 52 ++++++++++++++-----
 drivers/net/ethernet/toshiba/ps3_gelic_net.c  | 12 +----
 drivers/net/ethernet/wiznet/w5100.c           |  6 ++-
 drivers/net/ethernet/wiznet/w5300.c           |  6 ++-
 drivers/net/rionet.c                          |  3 ++
 drivers/net/wimax/i2400m/netdev.c             |  5 +-
 drivers/s390/net/qeth_l2_main.c               | 12 ++++-
 9 files changed, 80 insertions(+), 35 deletions(-)

diff --git a/Documentation/networking/netdev-features.rst b/Documentation/networking/netdev-features.rst
index a2d7d7160e39..ca4632839f59 100644
--- a/Documentation/networking/netdev-features.rst
+++ b/Documentation/networking/netdev-features.rst
@@ -157,9 +157,7 @@ Don't use it in drivers.
  * VLAN challenged
 
 NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
-headers. Some drivers set this because the cards can't handle the bigger MTU.
-[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
-VLANs. This may be not useful, though.]
+headers.
 
 *  rx-fcs
 
diff --git a/drivers/net/ethernet/intel/e100.c b/drivers/net/ethernet/intel/e100.c
index 1b8d015ebfb0..caea569a76c5 100644
--- a/drivers/net/ethernet/intel/e100.c
+++ b/drivers/net/ethernet/intel/e100.c
@@ -2753,6 +2753,18 @@ static int e100_do_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
 	return generic_mii_ioctl(&nic->mii, if_mii(ifr), cmd, NULL);
 }
 
+static int e100_change_mtu(struct net_device *netdev, int new_mtu)
+{
+	struct nic *nic = netdev_priv(netdev);
+
+	netdev->mtu = new_mtu;
+	/* D100 MAC doesn't allow rx of vlan packets with normal MTU */
+	if (nic->mac < mac_82558_D101_A4)
+		vlan_constrain_mtu(netdev);
+
+	return 0;
+}
+
 static int e100_alloc(struct nic *nic)
 {
 	nic->mem = pci_alloc_consistent(nic->pdev, sizeof(struct mem),
@@ -2808,6 +2820,7 @@ static const struct net_device_ops e100_netdev_ops = {
 	.ndo_set_rx_mode	= e100_set_multicast_list,
 	.ndo_set_mac_address	= e100_set_mac_address,
 	.ndo_do_ioctl		= e100_do_ioctl,
+	.ndo_change_mtu		= e100_change_mtu,
 	.ndo_tx_timeout		= e100_tx_timeout,
 #ifdef CONFIG_NET_POLL_CONTROLLER
 	.ndo_poll_controller	= e100_netpoll,
@@ -2883,7 +2896,7 @@ static int e100_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 
 	/* D100 MAC doesn't allow rx of vlan packets with normal MTU */
 	if (nic->mac < mac_82558_D101_A4)
-		netdev->features |= NETIF_F_VLAN_CHALLENGED;
+		netdev->priv_flags |= IFF_NO_VLAN_ROOM;
 
 	/* locks must be initialized before calling hw_reset */
 	spin_lock_init(&nic->cb_lock);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
index b438f5576e18..cc25e4bade16 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/switchx2.c
@@ -184,38 +184,54 @@ static int mlxsw_sx_port_oper_status_get(struct mlxsw_sx_port *mlxsw_sx_port,
 	return 0;
 }
 
-static int __mlxsw_sx_port_mtu_set(struct mlxsw_sx_port *mlxsw_sx_port,
-				   u16 mtu)
+static int mlxsw_sx_port_mtu_max(struct mlxsw_sx_port *mlxsw_sx_port)
 {
 	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
 	char pmtu_pl[MLXSW_REG_PMTU_LEN];
-	int max_mtu;
 	int err;
 
 	mlxsw_reg_pmtu_pack(pmtu_pl, mlxsw_sx_port->local_port, 0);
 	err = mlxsw_reg_query(mlxsw_sx->core, MLXSW_REG(pmtu), pmtu_pl);
 	if (err)
-		return err;
-	max_mtu = mlxsw_reg_pmtu_max_mtu_get(pmtu_pl);
+		return err; /* all errors are negative */
+	return mlxsw_reg_pmtu_max_mtu_get(pmtu_pl);
+}
 
-	if (mtu > max_mtu)
-		return -EINVAL;
+static int ____mlxsw_sx_port_mtu_eth_set(struct mlxsw_sx_port *mlxsw_sx_port,
+					 u16 mtu)
+{
+	struct mlxsw_sx *mlxsw_sx = mlxsw_sx_port->mlxsw_sx;
+	char pmtu_pl[MLXSW_REG_PMTU_LEN];
 
 	mlxsw_reg_pmtu_pack(pmtu_pl, mlxsw_sx_port->local_port, mtu);
 	return mlxsw_reg_write(mlxsw_sx->core, MLXSW_REG(pmtu), pmtu_pl);
 }
 
+static int __mlxsw_sx_port_mtu_eth_set(struct mlxsw_sx_port *mlxsw_sx_port,
+				       u16 mtu)
+{
+	int max_mtu = mlxsw_sx_port_mtu_max(mlxsw_sx_port);
+
+	if (max_mtu < 0)
+		return max_mtu; /* error code */
+
+	if (mtu > max_mtu)
+		return -EINVAL;
+
+	return ____mlxsw_sx_port_mtu_eth_set(mlxsw_sx_port, mtu);
+}
+
 static int mlxsw_sx_port_mtu_eth_set(struct mlxsw_sx_port *mlxsw_sx_port,
 				     u16 mtu)
 {
 	mtu += MLXSW_TXHDR_LEN + ETH_HLEN;
-	return __mlxsw_sx_port_mtu_set(mlxsw_sx_port, mtu);
+	return __mlxsw_sx_port_mtu_eth_set(mlxsw_sx_port, mtu);
 }
 
 static int mlxsw_sx_port_mtu_ib_set(struct mlxsw_sx_port *mlxsw_sx_port,
 				    u16 mtu)
 {
-	return __mlxsw_sx_port_mtu_set(mlxsw_sx_port, mtu);
+	return __mlxsw_sx_port_mtu_eth_set(mlxsw_sx_port, mtu);
 }
 
 static int mlxsw_sx_port_ib_port_set(struct mlxsw_sx_port *mlxsw_sx_port,
@@ -336,12 +352,23 @@ static netdev_tx_t mlxsw_sx_port_xmit(struct sk_buff *skb,
 static int mlxsw_sx_port_change_mtu(struct net_device *dev, int mtu)
 {
 	struct mlxsw_sx_port *mlxsw_sx_port = netdev_priv(dev);
+	int max_mtu = mlxsw_sx_port_mtu_max(mlxsw_sx_port);
 	int err;
 
-	err = mlxsw_sx_port_mtu_eth_set(mlxsw_sx_port, mtu);
+	if (max_mtu < 0)
+		return max_mtu; /* error code */
+
+	max_mtu -= MLXSW_TXHDR_LEN + ETH_HLEN;
+	if (mtu > max_mtu)
+		return -EINVAL;
+
+	err = ____mlxsw_sx_port_mtu_eth_set(mlxsw_sx_port, mtu);
 	if (err)
 		return err;
+
 	dev->mtu = mtu;
+	__vlan_constrain_mtu(dev, max_mtu);
+
 	return 0;
 }
 
@@ -1013,8 +1040,9 @@ static int __mlxsw_sx_port_eth_create(struct mlxsw_sx *mlxsw_sx, u8 local_port,
 
 	netif_carrier_off(dev);
 
-	dev->features |= NETIF_F_NETNS_LOCAL | NETIF_F_LLTX | NETIF_F_SG |
-			 NETIF_F_VLAN_CHALLENGED;
+	dev->features |= NETIF_F_NETNS_LOCAL | NETIF_F_LLTX | NETIF_F_SG;
+
+	dev->priv_flags |= IFF_NO_VLAN_ROOM;
 
 	dev->min_mtu = 0;
 	dev->max_mtu = ETH_MAX_MTU;
diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
index 310e6839c6e5..7e008139ad22 100644
--- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c
+++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c
@@ -1479,18 +1479,8 @@ int gelic_net_setup_netdev(struct net_device *netdev, struct gelic_card *card)
 	}
 	memcpy(netdev->dev_addr, &v1, ETH_ALEN);
 
-	if (card->vlan_required) {
+	if (card->vlan_required)
 		netdev->hard_header_len += VLAN_HLEN;
-		/*
-		 * As vlan is internally used,
-		 * we can not receive vlan packets
-		 */
-		netdev->features |= NETIF_F_VLAN_CHALLENGED;
-	}
-
-	/* MTU range: 64 - 1518 */
-	netdev->min_mtu = GELIC_NET_MIN_MTU;
-	netdev->max_mtu = GELIC_NET_MAX_MTU;
 
 	status = register_netdev(netdev);
 	if (status) {
diff --git a/drivers/net/ethernet/wiznet/w5100.c b/drivers/net/ethernet/wiznet/w5100.c
index c0d181a7f83a..a417c0dbce56 100644
--- a/drivers/net/ethernet/wiznet/w5100.c
+++ b/drivers/net/ethernet/wiznet/w5100.c
@@ -24,6 +24,7 @@
 #include <linux/interrupt.h>
 #include <linux/irq.h>
 #include <linux/gpio.h>
+#include <linux/if_vlan.h>
 
 #include "w5100.h"
 
@@ -1038,6 +1039,7 @@ static const struct net_device_ops w5100_netdev_ops = {
 	.ndo_set_rx_mode	= w5100_set_rx_mode,
 	.ndo_set_mac_address	= w5100_set_macaddr,
 	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= vlan_constrained_change_mtu,
 };
 
 static int w5100_mmio_probe(struct platform_device *pdev)
@@ -1137,9 +1139,9 @@ int w5100_probe(struct device *dev, const struct w5100_ops *ops,
 	netif_napi_add(ndev, &priv->napi, w5100_napi_poll, 16);
 
 	/* This chip doesn't support VLAN packets with normal MTU,
-	 * so disable VLAN for this device.
+	 * so constrain MTU of upper VLAN devices.
 	 */
-	ndev->features |= NETIF_F_VLAN_CHALLENGED;
+	ndev->priv_flags |= IFF_NO_VLAN_ROOM;
 
 	err = register_netdev(ndev);
 	if (err < 0)
diff --git a/drivers/net/ethernet/wiznet/w5300.c b/drivers/net/ethernet/wiznet/w5300.c
index 46aae30c4636..e4d97a6749b3 100644
--- a/drivers/net/ethernet/wiznet/w5300.c
+++ b/drivers/net/ethernet/wiznet/w5300.c
@@ -25,6 +25,7 @@
 #include <linux/interrupt.h>
 #include <linux/irq.h>
 #include <linux/gpio.h>
+#include <linux/if_vlan.h>
 
 #define DRV_NAME	"w5300"
 #define DRV_VERSION	"2012-04-04"
@@ -520,6 +521,7 @@ static const struct net_device_ops w5300_netdev_ops = {
 	.ndo_set_rx_mode	= w5300_set_rx_mode,
 	.ndo_set_mac_address	= w5300_set_macaddr,
 	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= vlan_constrained_change_mtu,
 };
 
 static int w5300_hw_probe(struct platform_device *pdev)
@@ -606,9 +608,9 @@ static int w5300_probe(struct platform_device *pdev)
 	netif_napi_add(ndev, &priv->napi, w5300_napi_poll, 16);
 
 	/* This chip doesn't support VLAN packets with normal MTU,
-	 * so disable VLAN for this device.
+	 * so constrain MTU of upper VLAN devices.
 	 */
-	ndev->features |= NETIF_F_VLAN_CHALLENGED;
+	ndev->priv_flags |= IFF_NO_VLAN_ROOM;
 
 	err = register_netdev(ndev);
 	if (err < 0)
diff --git a/drivers/net/rionet.c b/drivers/net/rionet.c
index 2056d6ad04b5..1fd7df17d552 100644
--- a/drivers/net/rionet.c
+++ b/drivers/net/rionet.c
@@ -21,6 +21,7 @@
 #include <linux/crc32.h>
 #include <linux/ethtool.h>
 #include <linux/reboot.h>
+#include <linux/if_vlan.h>
 
 #define DRV_NAME        "rionet"
 #define DRV_VERSION     "0.3"
@@ -476,6 +477,7 @@ static const struct net_device_ops rionet_netdev_ops = {
 	.ndo_start_xmit		= rionet_start_xmit,
 	.ndo_validate_addr	= eth_validate_addr,
 	.ndo_set_mac_address	= eth_mac_addr,
+	.ndo_change_mtu		= vlan_constrained_change_mtu,
 };
 
 static int rionet_setup_netdev(struct rio_mport *mport, struct net_device *ndev)
@@ -514,6 +516,7 @@ static int rionet_setup_netdev(struct rio_mport *mport, struct net_device *ndev)
 	ndev->min_mtu = ETH_MIN_MTU;
 	ndev->max_mtu = RIONET_MAX_MTU;
 	ndev->features = NETIF_F_LLTX;
+	ndev->priv_flags |= IFF_NO_VLAN_ROOM;
 	SET_NETDEV_DEV(ndev, &mport->dev);
 	ndev->ethtool_ops = &rionet_ethtool_ops;
 
diff --git a/drivers/net/wimax/i2400m/netdev.c b/drivers/net/wimax/i2400m/netdev.c
index a7fcbceb6e6b..4097182a21c7 100644
--- a/drivers/net/wimax/i2400m/netdev.c
+++ b/drivers/net/wimax/i2400m/netdev.c
@@ -583,13 +583,12 @@ void i2400m_netdev_setup(struct net_device *net_dev)
 {
 	d_fnstart(3, NULL, "(net_dev %p)\n", net_dev);
 	ether_setup(net_dev);
+	net_dev->type = ARPHRD_RAWIP;
 	net_dev->mtu = I2400M_MAX_MTU;
 	net_dev->min_mtu = 0;
 	net_dev->max_mtu = I2400M_MAX_MTU;
 	net_dev->tx_queue_len = I2400M_TX_QLEN;
-	net_dev->features =
-		  NETIF_F_VLAN_CHALLENGED
-		| NETIF_F_HIGHDMA;
+	net_dev->features = NETIF_F_HIGHDMA;
 	net_dev->flags =
 		IFF_NOARP		/* i2400m is apure IP device */
 		& (~IFF_BROADCAST	/* i2400m is P2P */
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index da47e423e1b1..e9507237f3b7 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -629,6 +629,15 @@ static void qeth_l2_set_rx_mode(struct net_device *dev)
 	schedule_work(&card->rx_mode_work);
 }
 
+static int qeth_l2_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct qeth_card *card = dev->ml_priv;
+
+	dev->mtu = new_mtu;
+	if (IS_OSM(card))
+		vlan_constrain_mtu(dev);
+}
+
 static const struct net_device_ops qeth_l2_netdev_ops = {
 	.ndo_open		= qeth_open,
 	.ndo_stop		= qeth_stop,
@@ -640,6 +649,7 @@ static const struct net_device_ops qeth_l2_netdev_ops = {
 	.ndo_set_rx_mode	= qeth_l2_set_rx_mode,
 	.ndo_do_ioctl		= qeth_do_ioctl,
 	.ndo_set_mac_address    = qeth_l2_set_mac_address,
+	.ndo_change_mtu		= qeth_l2_change_mtu,
 	.ndo_vlan_rx_add_vid	= qeth_l2_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid   = qeth_l2_vlan_rx_kill_vid,
 	.ndo_tx_timeout	   	= qeth_tx_timeout,
@@ -675,7 +685,7 @@ static int qeth_l2_setup_netdev(struct qeth_card *card)
 	card->dev->priv_flags |= IFF_UNICAST_FLT;
 
 	if (IS_OSM(card)) {
-		card->dev->features |= NETIF_F_VLAN_CHALLENGED;
+		card->dev->priv_flags |= IFF_NO_VLAN_ROOM;
 	} else {
 		if (!IS_VM_NIC(card))
 			card->dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another
  2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
                   ` (10 preceding siblings ...)
  2020-05-27 21:25 ` [RFC PATCH net-next 11/11] net: leverage IFF_NO_VLAN_ROOM to limit NETIF_F_VLAN_CHALLENGED Edwin Peer
@ 2020-05-28  0:15 ` Michał Mirosław
  2020-05-28  0:39   ` Edwin Peer
  11 siblings, 1 reply; 14+ messages in thread
From: Michał Mirosław @ 2020-05-28  0:15 UTC (permalink / raw)
  To: Edwin Peer; +Cc: netdev

On Wed, May 27, 2020 at 02:25:01PM -0700, Edwin Peer wrote:
> This series began life as a modest attempt to fix two issues pertaining
> to VLANs nested inside Geneve tunnels and snowballed from there. The
> first issue, addressed by a simple one-liner, is that GSO is not enabled
> for upper VLAN devices on top of Geneve. The second issue, addressed by
> the balance of the series, deals largely with MTU handling. VLAN devices
> above L2 in L3 tunnels inherit the MTU of the underlying device. This
> causes IP fragmentation because the inner L2 cannot be expanded within
> the same maximum L3 size to accommodate the additional VLAN tag.
> 
> As a first attempt, a new flag was introduced to generalize what was
> already being done for MACsec devices. This flag was unconditionally
> set for all devices that have a size constrained L2, such as is the
> case for Geneve and VXLAN tunnel devices. This doesn't quite do the
> right thing, however, if the underlying device MTU happens to be
> configured to a lower MTU than is supported. Thus, the approach was
> further refined to set IFF_NO_VLAN_ROOM when changing MTU, based on
> whether the underlying device L2 still has room for VLAN tags, but
> stopping short of registering device notifiers to update upper device
> MTU whenever a lower device changes. VLAN devices will thus do the
> sensible thing if they are applied to an already configured device,
> but will not dynamically update whenever the underlying device's MTU
> is subsequently changed (this seemed a bridge too far).
[...]

Hi!

Good to see someone taking on the VLAN MTU mess.  :-)

Have you considered adding a 'vlan_headroom' field (or another name)
for a netdev instead of a flag? This would submit easily to device
aggregation (just take min from the slaves) and would also handle
nested VLANs gracefully (subtracting for every layer).

In patch 3 you seem to assume that if lower device reduces MTU below
its max, then its ok to push it up with VLAN headers. I don't think
this is apropriate when reducing MTU because of eg. PMTU limit for
a tunnel.

Best Regards,
Michał Mirosław

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another
  2020-05-28  0:15 ` [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Michał Mirosław
@ 2020-05-28  0:39   ` Edwin Peer
  0 siblings, 0 replies; 14+ messages in thread
From: Edwin Peer @ 2020-05-28  0:39 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev

On Wed, May 27, 2020 at 5:15 PM Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:

> Have you considered adding a 'vlan_headroom' field (or another name)
> for a netdev instead of a flag? This would submit easily to device
> aggregation (just take min from the slaves) and would also handle
> nested VLANs gracefully (subtracting for every layer).

Great idea, I think that would be much nicer.

> In patch 3 you seem to assume that if lower device reduces MTU below
> its max, then its ok to push it up with VLAN headers. I don't think
> this is apropriate when reducing MTU because of eg. PMTU limit for
> a tunnel.

Indeed. For non-tunnel devices I think this behavior is still correct,
because past the 1st hop (where device MTU should be appropriate), all
of L2, including any VLANs, has been replaced by something else. But
yes, tunnels probably do need to unconditionally reduce MTU, because
PMTU is something more dynamic. I guess I kind of half thought about
this for gre6, where this is what I did because PMTU is so much more
in your face for IPv6.

Regards,
Edwin Peer

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-05-28  0:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-27 21:25 [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 01/11] net: geneve: enable vlan offloads Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 02/11] net: do away with the IFF_XMIT_DST_RELEASE_PERM flag Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 03/11] net: vlan: add IFF_NO_VLAN_ROOM to constrain MTU Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 04/11] net: geneve: constrain upper VLAN MTU using IFF_NO_VLAN_ROOM Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 05/11] net: vxlan: " Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 06/11] net: l2tp_eth: " Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 07/11] net: gre: " Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 08/11] net: distribute IFF_NO_VLAN_ROOM into aggregate devs Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 09/11] net: ntb_netdev: support VLAN using IFF_NO_VLAN_ROOM Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 10/11] net: vlan: disallow non-Ethernet devices Edwin Peer
2020-05-27 21:25 ` [RFC PATCH net-next 11/11] net: leverage IFF_NO_VLAN_ROOM to limit NETIF_F_VLAN_CHALLENGED Edwin Peer
2020-05-28  0:15 ` [RFC PATCH net-next 00/11] Nested VLANs - decimate flags and add another Michał Mirosław
2020-05-28  0:39   ` Edwin Peer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).