linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] net: Set maximum receive packet size on veth interfaces
@ 2017-05-11 13:46 Fredrik Markstrom
  2017-05-11 13:46 ` [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls Fredrik Markstrom
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Fredrik Markstrom @ 2017-05-11 13:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge, Fredrik Markstrom


Currently veth drops all packets larger then the mtu set on the receiving
end of the pair. This is inconsistent with most hardware ethernet drivers
that happily receives packets up the the ethernet MTU independent of the
configured MTU.

This patch set adds a new driver attribute to set the maximum size of
received packet to make it possible to create configurations similar to
those possible with (most) hardware ethernet interfaces.

The set consists of two patches. The first one adding a parameter do the
dev_forward_skb functions to specify the maximum packet size, the
second one implents a new attribute (VETH_MRU) in the veth driver.

Fredrik Markstrom (1):
  veth: Added attribute to set maximum receive size on veth interfaces

Fredrik Markström (1):
  net: Added mtu parameter to dev_forward_skb calls

 drivers/net/ipvlan/ipvlan_core.c |  7 ++++---
 drivers/net/macvlan.c            |  4 ++--
 drivers/net/veth.c               | 45 +++++++++++++++++++++++++++++++++++++++-
 include/linux/netdevice.h        | 10 ++++-----
 include/uapi/linux/veth.h        |  1 +
 net/bridge/br_forward.c          |  4 ++--
 net/core/dev.c                   | 17 +++++++++------
 net/core/filter.c                |  4 ++--
 net/l2tp/l2tp_eth.c              |  2 +-
 9 files changed, 72 insertions(+), 22 deletions(-)

v2 - Updated description and fixed compile error in net/bridge/br_forward.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
  2017-05-11 13:46 [PATCH v2 0/2] net: Set maximum receive packet size on veth interfaces Fredrik Markstrom
@ 2017-05-11 13:46 ` Fredrik Markstrom
  2017-05-11 16:01   ` Stephen Hemminger
  2017-05-11 13:46 ` [PATCH v2 2/2] veth: Added attribute to set maximum receive size on veth interfaces Fredrik Markstrom
  2017-05-11 13:46 ` Support for VETH_MRU in libnl Fredrik Markstrom
  2 siblings, 1 reply; 10+ messages in thread
From: Fredrik Markstrom @ 2017-05-11 13:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge,
	Fredrik Markström

From: Fredrik Markström <fredrik.markstrom@gmail.com>

is_skb_forwardable() currently checks if the packet size is <= mtu of
the receiving interface. This is not consistent with most of the hardware
ethernet drivers that happily receives packets larger then MTU.

This patch adds a parameter to dev_forward_skb and is_skb_forwardable so
that the caller can override this packet size limit.

Signed-off-by: Fredrik Markstrom <fredrik.markstrom@gmail.com>
---
 drivers/net/ipvlan/ipvlan_core.c |  7 ++++---
 drivers/net/macvlan.c            |  4 ++--
 drivers/net/veth.c               |  2 +-
 include/linux/netdevice.h        | 10 +++++-----
 net/bridge/br_forward.c          |  4 ++--
 net/core/dev.c                   | 17 +++++++++++------
 net/core/filter.c                |  4 ++--
 net/l2tp/l2tp_eth.c              |  2 +-
 8 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 1f3295e274d0..dbbe48ade204 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -234,7 +234,8 @@ void ipvlan_process_multicast(struct work_struct *work)
 				nskb->pkt_type = pkt_type;
 				nskb->dev = ipvlan->dev;
 				if (tx_pkt)
-					ret = dev_forward_skb(ipvlan->dev, nskb);
+					ret = dev_forward_skb(ipvlan->dev,
+							      nskb, 0);
 				else
 					ret = netif_rx(nskb);
 			}
@@ -301,7 +302,7 @@ static int ipvlan_rcv_frame(struct ipvl_addr *addr, struct sk_buff **pskb,
 
 	if (local) {
 		skb->pkt_type = PACKET_HOST;
-		if (dev_forward_skb(ipvlan->dev, skb) == NET_RX_SUCCESS)
+		if (dev_forward_skb(ipvlan->dev, skb, 0) == NET_RX_SUCCESS)
 			success = true;
 	} else {
 		ret = RX_HANDLER_ANOTHER;
@@ -547,7 +548,7 @@ static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
 		 * the skb for the main-dev. At the RX side we just return
 		 * RX_PASS for it to be processed further on the stack.
 		 */
-		return dev_forward_skb(ipvlan->phy_dev, skb);
+		return dev_forward_skb(ipvlan->phy_dev, skb, 0);
 
 	} else if (is_multicast_ether_addr(eth->h_dest)) {
 		ipvlan_skb_crossing_ns(skb, NULL);
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index 9261722960a7..4db2876c1e44 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -202,7 +202,7 @@ static int macvlan_broadcast_one(struct sk_buff *skb,
 	struct net_device *dev = vlan->dev;
 
 	if (local)
-		return __dev_forward_skb(dev, skb);
+		return __dev_forward_skb(dev, skb, 0);
 
 	skb->dev = dev;
 	if (ether_addr_equal_64bits(eth->h_dest, dev->broadcast))
@@ -495,7 +495,7 @@ static int macvlan_queue_xmit(struct sk_buff *skb, struct net_device *dev)
 		dest = macvlan_hash_lookup(port, eth->h_dest);
 		if (dest && dest->mode == MACVLAN_MODE_BRIDGE) {
 			/* send to lowerdev first for its network taps */
-			dev_forward_skb(vlan->lowerdev, skb);
+			dev_forward_skb(vlan->lowerdev, skb, 0);
 
 			return NET_XMIT_SUCCESS;
 		}
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c39d6d690e5..561da3a63b8a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -116,7 +116,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 	}
 
-	if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
+	if (likely(dev_forward_skb(rcv, skb, 0) == NET_RX_SUCCESS)) {
 		struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
 
 		u64_stats_update_begin(&stats->syncp);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 97456b2539e4..f207b083ffec 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3282,16 +3282,16 @@ int dev_change_xdp_fd(struct net_device *dev, int fd, u32 flags);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 				    struct netdev_queue *txq, int *ret);
-int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
-int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
+int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu);
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu);
 bool is_skb_forwardable(const struct net_device *dev,
-			const struct sk_buff *skb);
+			const struct sk_buff *skb, int mtu);
 
 static __always_inline int ____dev_forward_skb(struct net_device *dev,
-					       struct sk_buff *skb)
+					       struct sk_buff *skb, int mtu)
 {
 	if (skb_orphan_frags(skb, GFP_ATOMIC) ||
-	    unlikely(!is_skb_forwardable(dev, skb))) {
+	    unlikely(!is_skb_forwardable(dev, skb, mtu))) {
 		atomic_long_inc(&dev->rx_dropped);
 		kfree_skb(skb);
 		return NET_RX_DROP;
diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 902af6ba481c..15ab57da5ef1 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -35,7 +35,7 @@ static inline int should_deliver(const struct net_bridge_port *p,
 
 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
-	if (!is_skb_forwardable(skb->dev, skb))
+	if (!is_skb_forwardable(skb->dev, skb, 0))
 		goto drop;
 
 	skb_push(skb, ETH_HLEN);
@@ -96,7 +96,7 @@ static void __br_forward(const struct net_bridge_port *to,
 		net = dev_net(indev);
 	} else {
 		if (unlikely(netpoll_tx_running(to->br->dev))) {
-			if (!is_skb_forwardable(skb->dev, skb)) {
+			if (!is_skb_forwardable(skb->dev, skb, 0)) {
 				kfree_skb(skb);
 			} else {
 				skb_push(skb, ETH_HLEN);
diff --git a/net/core/dev.c b/net/core/dev.c
index 533a6d6f6092..f7c53d7c8e26 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1767,14 +1767,18 @@ static inline void net_timestamp_set(struct sk_buff *skb)
 			__net_timestamp(SKB);		\
 	}						\
 
-bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
+bool is_skb_forwardable(const struct net_device *dev,
+			const struct sk_buff *skb, int mtu)
 {
 	unsigned int len;
 
 	if (!(dev->flags & IFF_UP))
 		return false;
 
-	len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
+	if (mtu == 0)
+		mtu = dev->mtu;
+
+	len = mtu + dev->hard_header_len + VLAN_HLEN;
 	if (skb->len <= len)
 		return true;
 
@@ -1788,9 +1792,9 @@ bool is_skb_forwardable(const struct net_device *dev, const struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(is_skb_forwardable);
 
-int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu)
 {
-	int ret = ____dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, mtu);
 
 	if (likely(!ret)) {
 		skb->protocol = eth_type_trans(skb, dev);
@@ -1806,6 +1810,7 @@ EXPORT_SYMBOL_GPL(__dev_forward_skb);
  *
  * @dev: destination network device
  * @skb: buffer to forward
+ * @mtu: Maximum size to forward. If 0 dev->mtu is used.
  *
  * return values:
  *	NET_RX_SUCCESS	(no congestion)
@@ -1819,9 +1824,9 @@ EXPORT_SYMBOL_GPL(__dev_forward_skb);
  * we have to clear all information in the skb that could
  * impact namespace isolation.
  */
-int dev_forward_skb(struct net_device *dev, struct sk_buff *skb)
+int dev_forward_skb(struct net_device *dev, struct sk_buff *skb, int mtu)
 {
-	return __dev_forward_skb(dev, skb) ?: netif_rx_internal(skb);
+	return __dev_forward_skb(dev, skb, mtu) ?: netif_rx_internal(skb);
 }
 EXPORT_SYMBOL_GPL(dev_forward_skb);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index ebaeaf2e46e8..3f3eb26e7ea1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1632,13 +1632,13 @@ static const struct bpf_func_proto bpf_csum_update_proto = {
 
 static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb)
 {
-	return dev_forward_skb(dev, skb);
+	return dev_forward_skb(dev, skb, 0);
 }
 
 static inline int __bpf_rx_skb_no_mac(struct net_device *dev,
 				      struct sk_buff *skb)
 {
-	int ret = ____dev_forward_skb(dev, skb);
+	int ret = ____dev_forward_skb(dev, skb, 0);
 
 	if (likely(!ret)) {
 		skb->dev = dev;
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
index 6fd41d7afe1e..1258555b6578 100644
--- a/net/l2tp/l2tp_eth.c
+++ b/net/l2tp/l2tp_eth.c
@@ -164,7 +164,7 @@ static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb,
 	skb_dst_drop(skb);
 	nf_reset(skb);
 
-	if (dev_forward_skb(dev, skb) == NET_RX_SUCCESS) {
+	if (dev_forward_skb(dev, skb, 0) == NET_RX_SUCCESS) {
 		atomic_long_inc(&priv->rx_packets);
 		atomic_long_add(data_len, &priv->rx_bytes);
 	} else {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/2] veth: Added attribute to set maximum receive size on veth interfaces
  2017-05-11 13:46 [PATCH v2 0/2] net: Set maximum receive packet size on veth interfaces Fredrik Markstrom
  2017-05-11 13:46 ` [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls Fredrik Markstrom
@ 2017-05-11 13:46 ` Fredrik Markstrom
  2017-05-11 13:46 ` Support for VETH_MRU in libnl Fredrik Markstrom
  2 siblings, 0 replies; 10+ messages in thread
From: Fredrik Markstrom @ 2017-05-11 13:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge, Fredrik Markstrom

Currently veth drops all packet larger then the mtu set on the receiving
end of the pair. This is inconsistent with most hardware ethernet drivers.
This patch adds a new driver attribute to set the maximum size of received
packet to make it possible to create configurations similar to those
possible with (most) hardware ethernet interfaces.

Signed-off-by: Fredrik Markstrom <fredrik.markstrom@gmail.com>
---
 drivers/net/veth.c        | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 include/uapi/linux/veth.h |  1 +
 2 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 561da3a63b8a..5669286dd531 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -33,6 +33,7 @@ struct veth_priv {
 	struct net_device __rcu	*peer;
 	atomic64_t		dropped;
 	unsigned		requested_headroom;
+	int			mru;
 };
 
 /*
@@ -106,6 +107,7 @@ static const struct ethtool_ops veth_ethtool_ops = {
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct veth_priv *priv = netdev_priv(dev);
+	struct veth_priv *rcv_priv;
 	struct net_device *rcv;
 	int length = skb->len;
 
@@ -115,8 +117,10 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 		kfree_skb(skb);
 		goto drop;
 	}
+	rcv_priv = netdev_priv(rcv);
 
-	if (likely(dev_forward_skb(rcv, skb, 0) == NET_RX_SUCCESS)) {
+	if (likely(dev_forward_skb(rcv, skb, rcv_priv->mru) ==
+		   NET_RX_SUCCESS)) {
 		struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
 
 		u64_stats_update_begin(&stats->syncp);
@@ -346,6 +350,11 @@ static int veth_validate(struct nlattr *tb[], struct nlattr *data[])
 		if (!is_valid_veth_mtu(nla_get_u32(tb[IFLA_MTU])))
 			return -EINVAL;
 	}
+
+	if (tb[VETH_MRU])
+		if (!is_valid_veth_mtu(nla_get_u32(tb[VETH_MRU])))
+			return -EINVAL;
+
 	return 0;
 }
 
@@ -450,10 +459,15 @@ static int veth_newlink(struct net *src_net, struct net_device *dev,
 	 */
 
 	priv = netdev_priv(dev);
+	if (tb[VETH_MRU])
+		priv->mru = nla_get_u32(tb[VETH_MRU]);
 	rcu_assign_pointer(priv->peer, peer);
 
 	priv = netdev_priv(peer);
+	if (tbp[VETH_MRU])
+		priv->mru = nla_get_u32(tbp[VETH_MRU]);
 	rcu_assign_pointer(priv->peer, dev);
+
 	return 0;
 
 err_register_dev:
@@ -489,8 +503,34 @@ static void veth_dellink(struct net_device *dev, struct list_head *head)
 	}
 }
 
+static int veth_changelink(struct net_device *dev,
+			   struct nlattr *tb[], struct nlattr *data[])
+{
+	struct veth_priv *priv = netdev_priv(dev);
+
+	if (data && data[VETH_MRU])
+		priv->mru = nla_get_u32(data[VETH_MRU]);
+	return 0;
+}
+
+static size_t veth_get_size(const struct net_device *dev)
+{
+	return nla_total_size(4);/* VETH_MRU */
+}
+
+static int veth_fill_info(struct sk_buff *skb,
+			  const struct net_device *dev)
+{
+	struct veth_priv *priv = netdev_priv(dev);
+
+	if (nla_put_u32(skb, VETH_MRU, priv->mru))
+		return -EMSGSIZE;
+	return 0;
+}
+
 static const struct nla_policy veth_policy[VETH_INFO_MAX + 1] = {
 	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
+	[VETH_MRU]		= { .type = NLA_U32 },
 };
 
 static struct net *veth_get_link_net(const struct net_device *dev)
@@ -508,9 +548,12 @@ static struct rtnl_link_ops veth_link_ops = {
 	.validate	= veth_validate,
 	.newlink	= veth_newlink,
 	.dellink	= veth_dellink,
+	.changelink	= veth_changelink,
 	.policy		= veth_policy,
 	.maxtype	= VETH_INFO_MAX,
 	.get_link_net	= veth_get_link_net,
+	.get_size	= veth_get_size,
+	.fill_info	= veth_fill_info,
 };
 
 /*
diff --git a/include/uapi/linux/veth.h b/include/uapi/linux/veth.h
index 3354c1eb424e..8665b260f156 100644
--- a/include/uapi/linux/veth.h
+++ b/include/uapi/linux/veth.h
@@ -4,6 +4,7 @@
 enum {
 	VETH_INFO_UNSPEC,
 	VETH_INFO_PEER,
+	VETH_MRU,
 
 	__VETH_INFO_MAX
 #define VETH_INFO_MAX	(__VETH_INFO_MAX - 1)
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Support for VETH_MRU in libnl
  2017-05-11 13:46 [PATCH v2 0/2] net: Set maximum receive packet size on veth interfaces Fredrik Markstrom
  2017-05-11 13:46 ` [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls Fredrik Markstrom
  2017-05-11 13:46 ` [PATCH v2 2/2] veth: Added attribute to set maximum receive size on veth interfaces Fredrik Markstrom
@ 2017-05-11 13:46 ` Fredrik Markstrom
  2 siblings, 0 replies; 10+ messages in thread
From: Fredrik Markstrom @ 2017-05-11 13:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Stephen Hemminger, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge, Fredrik Markstrom

---
 include/linux/if_link.h           |   1 +
 include/netlink-private/types.h   |   1 +
 include/netlink/route/link/veth.h |   4 ++
 lib/route/link.c                  |   4 ++
 lib/route/link/veth.c             | 141 +++++++++++++++++++++++++++++---------
 5 files changed, 118 insertions(+), 33 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index 8b84939..b9859bd 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -316,6 +316,7 @@ struct ifla_vxlan_port_range {
 enum {
 	VETH_INFO_UNSPEC,
 	VETH_INFO_PEER,
+	VETH_MRU,
 
 	__VETH_INFO_MAX
 #define VETH_INFO_MAX   (__VETH_INFO_MAX - 1)
diff --git a/include/netlink-private/types.h b/include/netlink-private/types.h
index 3ff4fe1..c97090b 100644
--- a/include/netlink-private/types.h
+++ b/include/netlink-private/types.h
@@ -165,6 +165,7 @@ struct rtnl_link
 	uint32_t			l_flags;
 	uint32_t			l_change;
 	uint32_t 			l_mtu;
+	uint32_t 			l_mru;
 	uint32_t			l_link;
 	uint32_t			l_txqlen;
 	uint32_t			l_weight;
diff --git a/include/netlink/route/link/veth.h b/include/netlink/route/link/veth.h
index 35c2345..58eeb98 100644
--- a/include/netlink/route/link/veth.h
+++ b/include/netlink/route/link/veth.h
@@ -29,6 +29,10 @@ extern struct rtnl_link *rtnl_link_veth_get_peer(struct rtnl_link *);
 extern int rtnl_link_veth_add(struct nl_sock *sock, const char *name,
 			      const char *peer, pid_t pid);
 
+extern int rtnl_link_veth_set_mru(struct rtnl_link *, uint32_t);
+
+extern uint32_t rtnl_link_veth_get_mru(struct rtnl_link *);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/route/link.c b/lib/route/link.c
index 3d31ffc..3cdacbb 100644
--- a/lib/route/link.c
+++ b/lib/route/link.c
@@ -61,6 +61,7 @@
 #define LINK_ATTR_PHYS_PORT_ID	(1 << 28)
 #define LINK_ATTR_NS_FD		(1 << 29)
 #define LINK_ATTR_NS_PID	(1 << 30)
+#define LINK_ATTR_MRU		(1 << 31)
 
 static struct nl_cache_ops rtnl_link_ops;
 static struct nl_object_ops link_obj_ops;
@@ -1255,6 +1256,9 @@ int rtnl_link_fill_info(struct nl_msg *msg, struct rtnl_link *link)
 	if (link->ce_mask & LINK_ATTR_MTU)
 		NLA_PUT_U32(msg, IFLA_MTU, link->l_mtu);
 
+	if (link->ce_mask & LINK_ATTR_MRU)
+		NLA_PUT_U32(msg, IFLA_MTU, link->l_mru);
+
 	if (link->ce_mask & LINK_ATTR_TXQLEN)
 		NLA_PUT_U32(msg, IFLA_TXQLEN, link->l_txqlen);
 
diff --git a/lib/route/link/veth.c b/lib/route/link/veth.c
index e7e4a26..5dc15af 100644
--- a/lib/route/link/veth.c
+++ b/lib/route/link/veth.c
@@ -33,16 +33,62 @@
 
 #include <linux/if_link.h>
 
+#define VETH_HAS_MRU		(1<<0)
+
+struct veth_info
+{
+	struct rtnl_link *peer;
+	uint32_t		vei_mru;
+	uint32_t		vei_mask;
+};
+
 static struct nla_policy veth_policy[VETH_INFO_MAX+1] = {
 	[VETH_INFO_PEER]	= { .minlen = sizeof(struct ifinfomsg) },
+	[VETH_MRU]		= { .type = NLA_U32 },
 };
 
+static int veth_alloc(struct rtnl_link *link)
+{
+	struct rtnl_link *peer;
+	struct veth_info *vei = link->l_info;
+	int err;
+
+	/* return early if we are in recursion */
+	if (vei && vei->peer)
+		return 0;
+
+	if (!(peer = rtnl_link_alloc()))
+		return -NLE_NOMEM;
+
+	if ((vei = calloc(1, sizeof(*vei))) == NULL)
+	  return -NLE_NOMEM;
+
+	/* We don't need to hold a reference here, as link and
+	 * its peer should always be freed together.
+	 */
+	vei->peer = link;
+
+	peer->l_info = vei;
+	if ((err = rtnl_link_set_type(peer, "veth")) < 0) {
+		rtnl_link_put(peer);
+		return err;
+	}
+
+	if ((vei = calloc(1, sizeof(*vei))) == NULL)
+	  return -NLE_NOMEM;
+
+	vei->peer = peer;
+	link->l_info = vei;
+	return 0;
+}
+
 static int veth_parse(struct rtnl_link *link, struct nlattr *data,
 		      struct nlattr *xstats)
 {
 	struct nlattr *tb[VETH_INFO_MAX+1];
 	struct nlattr *peer_tb[IFLA_MAX + 1];
-	struct rtnl_link *peer = link->l_info;
+	struct veth_info *vei = link->l_info;
+	struct rtnl_link *peer = vei->peer;
 	int err;
 
 	NL_DBG(3, "Parsing veth link info");
@@ -50,6 +96,14 @@ static int veth_parse(struct rtnl_link *link, struct nlattr *data,
 	if ((err = nla_parse_nested(tb, VETH_INFO_MAX, data, veth_policy)) < 0)
 		goto errout;
 
+	if ((err = veth_alloc(link)) < 0)
+		goto errout;
+
+	if (tb[VETH_MRU]) {
+		vei->vei_mru = nla_get_u32(tb[VETH_MRU]);
+		vei->vei_mask |= VETH_HAS_MRU;
+	}
+
 	if (tb[VETH_INFO_PEER]) {
 		struct nlattr *nla_peer;
 		struct ifinfomsg *ifi;
@@ -86,7 +140,8 @@ static void veth_dump_line(struct rtnl_link *link, struct nl_dump_params *p)
 
 static void veth_dump_details(struct rtnl_link *link, struct nl_dump_params *p)
 {
-	struct rtnl_link *peer = link->l_info;
+	struct veth_info *vei = link->l_info;
+	struct rtnl_link *peer = vei->peer;
 	char *name;
 	name = rtnl_link_get_name(peer);
 	nl_dump(p, "      peer ");
@@ -98,7 +153,14 @@ static void veth_dump_details(struct rtnl_link *link, struct nl_dump_params *p)
 
 static int veth_clone(struct rtnl_link *dst, struct rtnl_link *src)
 {
-	struct rtnl_link *dst_peer = NULL, *src_peer = src->l_info;
+	struct veth_info *src_vei = src->l_info;
+	struct veth_info *dst_vei = dst->l_info;
+	struct rtnl_link *dst_peer = NULL, *src_peer = src_vei->peer;
+
+
+	printf("veth_clone not implemented\n");
+
+	// FIXME:
 
 	/* we are calling nl_object_clone() recursively, this should
 	 * happen only once */
@@ -116,7 +178,8 @@ static int veth_clone(struct rtnl_link *dst, struct rtnl_link *src)
 
 static int veth_put_attrs(struct nl_msg *msg, struct rtnl_link *link)
 {
-	struct rtnl_link *peer = link->l_info;
+	struct veth_info *vei = link->l_info;
+	struct rtnl_link *peer = vei->peer;
 	struct ifinfomsg ifi;
 	struct nlattr *data, *info_peer;
 
@@ -135,44 +198,31 @@ static int veth_put_attrs(struct nl_msg *msg, struct rtnl_link *link)
 		return -NLE_MSGSIZE;
 	rtnl_link_fill_info(msg, peer);
 	nla_nest_end(msg, info_peer);
-	nla_nest_end(msg, data);
 
-	return 0;
-}
-
-static int veth_alloc(struct rtnl_link *link)
-{
-	struct rtnl_link *peer;
-	int err;
-
-	/* return early if we are in recursion */
-	if (link->l_info)
-		return 0;
+	if (vei->vei_mask & VETH_HAS_MRU)
+		NLA_PUT_U32(msg, VETH_MRU, vei->vei_mru);
 
-	if (!(peer = rtnl_link_alloc()))
-		return -NLE_NOMEM;
+	nla_nest_end(msg, data);
 
-	/* We don't need to hold a reference here, as link and
-	 * its peer should always be freed together.
-	 */
-	peer->l_info = link;
-	if ((err = rtnl_link_set_type(peer, "veth")) < 0) {
-		rtnl_link_put(peer);
-		return err;
-	}
+nla_put_failure:
 
-	link->l_info = peer;
 	return 0;
 }
 
 static void veth_free(struct rtnl_link *link)
 {
-	struct rtnl_link *peer = link->l_info;
-	if (peer) {
+	struct veth_info *vei = link->l_info;
+	if (vei) {
+		struct rtnl_link *peer = vei->peer;
+		if (peer) {
+			vei->peer = NULL;
+			rtnl_link_put(peer);
+			/* avoid calling this recursively */
+			free(peer->l_info);
+			peer->l_info = NULL;
+		}
+		free(vei);
 		link->l_info = NULL;
-		/* avoid calling this recursively */
-		peer->l_info = NULL;
-		rtnl_link_put(peer);
 	}
 	/* the caller should finally free link */
 }
@@ -195,7 +245,7 @@ static struct rtnl_link_info_ops veth_info_ops = {
 #define IS_VETH_LINK_ASSERT(link) \
 	if ((link)->l_info_ops != &veth_info_ops) { \
 		APPBUG("Link is not a veth link. set type \"veth\" first."); \
-		return NULL; \
+		return -NLE_OPNOTSUPP; \
 	}
 /** @endcond */
 
@@ -293,6 +343,31 @@ int rtnl_link_veth_add(struct nl_sock *sock, const char *name,
 	return err;
 }
 
+int rtnl_link_veth_set_mru(struct rtnl_link *link, uint32_t mru)
+{
+	struct veth_info *vei = link->l_info;
+
+	IS_VETH_LINK_ASSERT(link);
+
+	vei->vei_mru = mru;
+	vei->vei_mask |= VETH_HAS_MRU;
+
+	return 0;
+}
+
+uint32_t rtnl_link_veth_get_mru(struct rtnl_link *link)
+{
+	struct veth_info *vei = link->l_info;
+
+	IS_VETH_LINK_ASSERT(link);
+
+	if (vei->vei_mask & VETH_HAS_MRU)
+		return vei->vei_mru;
+	else
+		return 0;
+}
+
+
 /** @} */
 
 static void __init veth_init(void)
-- 
2.10.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
  2017-05-11 13:46 ` [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls Fredrik Markstrom
@ 2017-05-11 16:01   ` Stephen Hemminger
  2017-05-11 19:10     ` Fredrik Markström
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2017-05-11 16:01 UTC (permalink / raw)
  To: Fredrik Markstrom
  Cc: Eric Dumazet, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge

On Thu, 11 May 2017 15:46:27 +0200
Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:

> From: Fredrik Markström <fredrik.markstrom@gmail.com>
> 
> is_skb_forwardable() currently checks if the packet size is <= mtu of
> the receiving interface. This is not consistent with most of the hardware
> ethernet drivers that happily receives packets larger then MTU.

Wrong.

Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
The actual limit is a function of the hardware. Some hardware can only limit by
power of 2; some can only limit frames larger than 1500; some have no limiting at all.
Any application that should:
  * not expect packets larger than MTU to be received
  * not send packets larger than MTU
  * check actual receive size. IP protocols will do truncation of padded packets

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
  2017-05-11 16:01   ` Stephen Hemminger
@ 2017-05-11 19:10     ` Fredrik Markström
  2017-05-11 19:44       ` Stephen Hemminger
  2017-05-12  8:05       ` [Bridge] " Teco Boot
  0 siblings, 2 replies; 10+ messages in thread
From: Fredrik Markström @ 2017-05-11 19:10 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric Dumazet, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge

On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 11 May 2017 15:46:27 +0200
> Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>
>> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>>
>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>> the receiving interface. This is not consistent with most of the hardware
>> ethernet drivers that happily receives packets larger then MTU.
>
> Wrong.

What is "Wrong" ? I was initially skeptical to implement this patch,
since it feels odd to have different MTU:s set on the two sides of a
link. After consulting some IP people and the RFC:s I kind of changed
my mind and thought I'd give it a shot. In the RFCs I couldn't find
anything that defined when and when not a received packet should be
dropped.

>
> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
> The actual limit is a function of the hardware. Some hardware can only limit by
> power of 2; some can only limit frames larger than 1500; some have no limiting at all.

Agreed. The purpose of these patches is to be able to configure an
veth interface to mimic these different behaviors. Non of the Ethernet
interfaces I have access to drops packets due to them being larger
then the configured MTU like veth does.

Being able to mimic real Ethernet hardware is useful when
consolidating hardware using containers/namespaces.

In a reply to a comment from David Miller in my previous version of
the patch I attached the example below to demonstrate the case in
detail.

This works with all ethernet hardware setups I have access to:

---- 8< ------
# Host A eth2 and Host B eth0 is on the same network.

# On HOST A
% ip address add 1.2.3.4/24 dev eth2
% ip link set eth2 mtu 300 up

% # HOST B
% ip address add 1.2.3.5/24 dev eth0
% ip link set eth0 mtu 1000 up
% ping -c 1 -W 1 -s 400 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms

--- 1.2.3.4 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
---- 8< ------


But it doesn't work with veth:

---- 8< ------
# veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
network namespace.
% # NS A
% ip address add 1.2.3.4/24 dev veth0
% ip link set veth0 mtu 300 up

% # NS B
% ip address add 1.2.3.5/24 dev veth1
% ip link set veth1 mtu 1000 up
% ping -c 1 -W 1 -s 400 1.2.3.4
PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.

--- 1.2.3.4 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
---- 8< ------

-- 
/Fredrik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
  2017-05-11 19:10     ` Fredrik Markström
@ 2017-05-11 19:44       ` Stephen Hemminger
  2017-05-12 14:35         ` Fredrik Markström
  2017-05-12  8:05       ` [Bridge] " Teco Boot
  1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2017-05-11 19:44 UTC (permalink / raw)
  To: Fredrik Markström
  Cc: Eric Dumazet, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge

On Thu, 11 May 2017 21:10:11 +0200
Fredrik Markström <fredrik.markstrom@gmail.com> wrote:

> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> > On Thu, 11 May 2017 15:46:27 +0200
> > Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
> >  
> >> From: Fredrik Markström <fredrik.markstrom@gmail.com>
> >>
> >> is_skb_forwardable() currently checks if the packet size is <= mtu of
> >> the receiving interface. This is not consistent with most of the hardware
> >> ethernet drivers that happily receives packets larger then MTU.  
> >
> > Wrong.  
> 
> What is "Wrong" ? I was initially skeptical to implement this patch,
> since it feels odd to have different MTU:s set on the two sides of a
> link. After consulting some IP people and the RFC:s I kind of changed
> my mind and thought I'd give it a shot. In the RFCs I couldn't find
> anything that defined when and when not a received packet should be
> dropped.
> 
> >
> > Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
> > The actual limit is a function of the hardware. Some hardware can only limit by
> > power of 2; some can only limit frames larger than 1500; some have no limiting at all.  
> 
> Agreed. The purpose of these patches is to be able to configure an
> veth interface to mimic these different behaviors. Non of the Ethernet
> interfaces I have access to drops packets due to them being larger
> then the configured MTU like veth does.
> 
> Being able to mimic real Ethernet hardware is useful when
> consolidating hardware using containers/namespaces.
> 
> In a reply to a comment from David Miller in my previous version of
> the patch I attached the example below to demonstrate the case in
> detail.
> 
> This works with all ethernet hardware setups I have access to:
> 

Why not just use an iptables rule to enforce what ever semantic you
want?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
  2017-05-11 19:10     ` Fredrik Markström
  2017-05-11 19:44       ` Stephen Hemminger
@ 2017-05-12  8:05       ` Teco Boot
  2017-05-12 12:48         ` Fredrik Markström
  1 sibling, 1 reply; 10+ messages in thread
From: Teco Boot @ 2017-05-12  8:05 UTC (permalink / raw)
  To: Fredrik Markström
  Cc: Stephen Hemminger, Eric Dumazet, Daniel Borkmann, netdev, bridge,
	linux-kernel, Alexei Starovoitov, David S. Miller

IP MTU and L2 MTU are different animals.

IMHO IP MTU is for fragmentation at sender of a link. There is no need dropping IP packets at receiver with size > configured IP MTU. IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.

For this patch: if veth has some notion on L2 MTU (e.g. buffer size limits), there has to be checks for it. I don't know why configuring MRU helps, more config, more mistakes. If there is no need for dropping the packet: don't.

Teco


> Op 11 mei 2017, om 21:10 heeft Fredrik Markström <fredrik.markstrom@gmail.com> het volgende geschreven:
> 
> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
> <stephen@networkplumber.org> wrote:
>> On Thu, 11 May 2017 15:46:27 +0200
>> Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>> 
>>> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>>> 
>>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>>> the receiving interface. This is not consistent with most of the hardware
>>> ethernet drivers that happily receives packets larger then MTU.
>> 
>> Wrong.
> 
> What is "Wrong" ? I was initially skeptical to implement this patch,
> since it feels odd to have different MTU:s set on the two sides of a
> link. After consulting some IP people and the RFC:s I kind of changed
> my mind and thought I'd give it a shot. In the RFCs I couldn't find
> anything that defined when and when not a received packet should be
> dropped.
> 
>> 
>> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
>> The actual limit is a function of the hardware. Some hardware can only limit by
>> power of 2; some can only limit frames larger than 1500; some have no limiting at all.
> 
> Agreed. The purpose of these patches is to be able to configure an
> veth interface to mimic these different behaviors. Non of the Ethernet
> interfaces I have access to drops packets due to them being larger
> then the configured MTU like veth does.
> 
> Being able to mimic real Ethernet hardware is useful when
> consolidating hardware using containers/namespaces.
> 
> In a reply to a comment from David Miller in my previous version of
> the patch I attached the example below to demonstrate the case in
> detail.
> 
> This works with all ethernet hardware setups I have access to:
> 
> ---- 8< ------
> # Host A eth2 and Host B eth0 is on the same network.
> 
> # On HOST A
> % ip address add 1.2.3.4/24 dev eth2
> % ip link set eth2 mtu 300 up
> 
> % # HOST B
> % ip address add 1.2.3.5/24 dev eth0
> % ip link set eth0 mtu 1000 up
> % ping -c 1 -W 1 -s 400 1.2.3.4
> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
> 408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms
> 
> --- 1.2.3.4 ping statistics ---
> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
> ---- 8< ------
> 
> 
> But it doesn't work with veth:
> 
> ---- 8< ------
> # veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
> network namespace.
> % # NS A
> % ip address add 1.2.3.4/24 dev veth0
> % ip link set veth0 mtu 300 up
> 
> % # NS B
> % ip address add 1.2.3.5/24 dev veth1
> % ip link set veth1 mtu 1000 up
> % ping -c 1 -W 1 -s 400 1.2.3.4
> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
> 
> --- 1.2.3.4 ping statistics ---
> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
> ---- 8< ------
> 
> -- 
> /Fredrik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Bridge] [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
  2017-05-12  8:05       ` [Bridge] " Teco Boot
@ 2017-05-12 12:48         ` Fredrik Markström
  0 siblings, 0 replies; 10+ messages in thread
From: Fredrik Markström @ 2017-05-12 12:48 UTC (permalink / raw)
  To: Teco Boot
  Cc: Stephen Hemminger, Eric Dumazet, Daniel Borkmann, netdev, bridge,
	linux-kernel, Alexei Starovoitov, David S. Miller

On Fri, May 12, 2017 at 10:05 AM, Teco Boot <teco@inf-net.nl> wrote:
> IP MTU and L2 MTU are different animals.
>
> IMHO IP MTU is for fragmentation at sender of a link. There is no need dropping IP packets at receiver with size > configured IP MTU. IP packets with size > receiver L2 MTU will be dropped at sub-IP layer.
>
First, thanks for putting words on the different MTU:s (L2 vs IP MTU)

I agree and don't understand why we are dropping packets due to
receiver IP MTU at all and would not mind removing that test
altogether, at least for veth.

/Fredrik


> For this patch: if veth has some notion on L2 MTU (e.g. buffer size limits), there has to be checks for it. I don't know why configuring MRU helps, more config, more mistakes. If there is no need for dropping the packet: don't.
>
> Teco
>
>
>> Op 11 mei 2017, om 21:10 heeft Fredrik Markström <fredrik.markstrom@gmail.com> het volgende geschreven:
>>
>> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>>> On Thu, 11 May 2017 15:46:27 +0200
>>> Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>>>
>>>> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>>>>
>>>> is_skb_forwardable() currently checks if the packet size is <= mtu of
>>>> the receiving interface. This is not consistent with most of the hardware
>>>> ethernet drivers that happily receives packets larger then MTU.
>>>
>>> Wrong.
>>
>> What is "Wrong" ? I was initially skeptical to implement this patch,
>> since it feels odd to have different MTU:s set on the two sides of a
>> link. After consulting some IP people and the RFC:s I kind of changed
>> my mind and thought I'd give it a shot. In the RFCs I couldn't find
>> anything that defined when and when not a received packet should be
>> dropped.
>>
>>>
>>> Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
>>> The actual limit is a function of the hardware. Some hardware can only limit by
>>> power of 2; some can only limit frames larger than 1500; some have no limiting at all.
>>
>> Agreed. The purpose of these patches is to be able to configure an
>> veth interface to mimic these different behaviors. Non of the Ethernet
>> interfaces I have access to drops packets due to them being larger
>> then the configured MTU like veth does.
>>
>> Being able to mimic real Ethernet hardware is useful when
>> consolidating hardware using containers/namespaces.
>>
>> In a reply to a comment from David Miller in my previous version of
>> the patch I attached the example below to demonstrate the case in
>> detail.
>>
>> This works with all ethernet hardware setups I have access to:
>>
>> ---- 8< ------
>> # Host A eth2 and Host B eth0 is on the same network.
>>
>> # On HOST A
>> % ip address add 1.2.3.4/24 dev eth2
>> % ip link set eth2 mtu 300 up
>>
>> % # HOST B
>> % ip address add 1.2.3.5/24 dev eth0
>> % ip link set eth0 mtu 1000 up
>> % ping -c 1 -W 1 -s 400 1.2.3.4
>> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
>> 408 bytes from 1.2.3.4: icmp_seq=1 ttl=64 time=1.57 ms
>>
>> --- 1.2.3.4 ping statistics ---
>> 1 packets transmitted, 1 received, 0% packet loss, time 0ms
>> rtt min/avg/max/mdev = 1.573/1.573/1.573/0.000 ms
>> ---- 8< ------
>>
>>
>> But it doesn't work with veth:
>>
>> ---- 8< ------
>> # veth0 and veth1 is a veth pair and veth1 has ben moved to a separate
>> network namespace.
>> % # NS A
>> % ip address add 1.2.3.4/24 dev veth0
>> % ip link set veth0 mtu 300 up
>>
>> % # NS B
>> % ip address add 1.2.3.5/24 dev veth1
>> % ip link set veth1 mtu 1000 up
>> % ping -c 1 -W 1 -s 400 1.2.3.4
>> PING 1.2.3.4 (1.2.3.4) 400(428) bytes of data.
>>
>> --- 1.2.3.4 ping statistics ---
>> 1 packets transmitted, 0 received, 100% packet loss, time 0ms
>> ---- 8< ------
>>
>> --
>> /Fredrik
>



-- 
/Fredrik

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls
  2017-05-11 19:44       ` Stephen Hemminger
@ 2017-05-12 14:35         ` Fredrik Markström
  0 siblings, 0 replies; 10+ messages in thread
From: Fredrik Markström @ 2017-05-12 14:35 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Eric Dumazet, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann, netdev, linux-kernel, bridge

On Thu, May 11, 2017 at 9:44 PM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 11 May 2017 21:10:11 +0200
> Fredrik Markström <fredrik.markstrom@gmail.com> wrote:
>
>> On Thu, May 11, 2017 at 6:01 PM, Stephen Hemminger
>> <stephen@networkplumber.org> wrote:
>> > On Thu, 11 May 2017 15:46:27 +0200
>> > Fredrik Markstrom <fredrik.markstrom@gmail.com> wrote:
>> >
>> >> From: Fredrik Markström <fredrik.markstrom@gmail.com>
>> >>
>> >> is_skb_forwardable() currently checks if the packet size is <= mtu of
>> >> the receiving interface. This is not consistent with most of the hardware
>> >> ethernet drivers that happily receives packets larger then MTU.
>> >
>> > Wrong.
>>
>> What is "Wrong" ? I was initially skeptical to implement this patch,
>> since it feels odd to have different MTU:s set on the two sides of a
>> link. After consulting some IP people and the RFC:s I kind of changed
>> my mind and thought I'd give it a shot. In the RFCs I couldn't find
>> anything that defined when and when not a received packet should be
>> dropped.
>>
>> >
>> > Hardware interfaces are free to drop any packet greater than MTU (actually MTU + VLAN).
>> > The actual limit is a function of the hardware. Some hardware can only limit by
>> > power of 2; some can only limit frames larger than 1500; some have no limiting at all.
>>
>> Agreed. The purpose of these patches is to be able to configure an
>> veth interface to mimic these different behaviors. Non of the Ethernet
>> interfaces I have access to drops packets due to them being larger
>> then the configured MTU like veth does.
>>
>> Being able to mimic real Ethernet hardware is useful when
>> consolidating hardware using containers/namespaces.
>>
>> In a reply to a comment from David Miller in my previous version of
>> the patch I attached the example below to demonstrate the case in
>> detail.
>>
>> This works with all ethernet hardware setups I have access to:
>>
>
> Why not just use an iptables rule to enforce what ever semantic you
> want?
>

I think that would be ok, but I can't find anything but TCPMSS but
that's only for TCP.

/Fredrik

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-05-12 14:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-11 13:46 [PATCH v2 0/2] net: Set maximum receive packet size on veth interfaces Fredrik Markstrom
2017-05-11 13:46 ` [PATCH v2 1/2] net: Added mtu parameter to dev_forward_skb calls Fredrik Markstrom
2017-05-11 16:01   ` Stephen Hemminger
2017-05-11 19:10     ` Fredrik Markström
2017-05-11 19:44       ` Stephen Hemminger
2017-05-12 14:35         ` Fredrik Markström
2017-05-12  8:05       ` [Bridge] " Teco Boot
2017-05-12 12:48         ` Fredrik Markström
2017-05-11 13:46 ` [PATCH v2 2/2] veth: Added attribute to set maximum receive size on veth interfaces Fredrik Markstrom
2017-05-11 13:46 ` Support for VETH_MRU in libnl Fredrik Markstrom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).