All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso'
@ 2016-07-18 11:49 Shmulik Ladkani
  2016-07-18 11:49 ` [PATCH 1/2] net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags Shmulik Ladkani
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Shmulik Ladkani @ 2016-07-18 11:49 UTC (permalink / raw)
  To: David S . Miller, netdev
  Cc: shmulik.ladkani, Eric Dumazet, shmulik.ladkani,
	Hannes Frederic Sowa, Florian Westphal

Currently IP fragmentation of GSO segments that exceed dst mtu is
considered only in the ipv4 forwarding case.

There are cases where GSO skbs that are bridged and then udp-tunneled
may have gso_size exceeding the egress device mtu.
It makes sense to fragment them, as in the non GSOed code path.

The exact cases where this behavior is needed is described and addressed
in the 2nd patch.

Shmulik Ladkani (2):
  net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags
  net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU,
    allow segmentation for local udp tunneled skbs

 include/net/ip.h          | 1 +
 net/ipv4/ip_forward.c     | 2 +-
 net/ipv4/ip_output.c      | 6 ++++--
 net/ipv4/ip_tunnel_core.c | 9 +++++++++
 net/ipv4/ipmr.c           | 2 +-
 5 files changed, 16 insertions(+), 4 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags
  2016-07-18 11:49 [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso' Shmulik Ladkani
@ 2016-07-18 11:49 ` Shmulik Ladkani
  2016-07-19  7:42   ` Hannes Frederic Sowa
  2016-07-18 11:49 ` [PATCH 2/2] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs Shmulik Ladkani
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Shmulik Ladkani @ 2016-07-18 11:49 UTC (permalink / raw)
  To: David S . Miller, netdev
  Cc: shmulik.ladkani, Eric Dumazet, shmulik.ladkani,
	Hannes Frederic Sowa, Florian Westphal

This flag indicates whether fragmentation of segments is allowed.

Formerly this policy was hardcoded according to IPSKB_FORWARDED (set by
either ip_forward or ipmr_forward).

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 include/net/ip.h      | 1 +
 net/ipv4/ip_forward.c | 2 +-
 net/ipv4/ip_output.c  | 6 ++++--
 net/ipv4/ipmr.c       | 2 +-
 4 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 08f36cd..9742b92 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -47,6 +47,7 @@ struct inet_skb_parm {
 #define IPSKB_REROUTED		BIT(4)
 #define IPSKB_DOREDIRECT	BIT(5)
 #define IPSKB_FRAG_PMTU		BIT(6)
+#define IPSKB_FRAG_SEGS		BIT(7)
 
 	u16			frag_max_size;
 };
diff --git a/net/ipv4/ip_forward.c b/net/ipv4/ip_forward.c
index 9f0a7b9..8b4ffd2 100644
--- a/net/ipv4/ip_forward.c
+++ b/net/ipv4/ip_forward.c
@@ -117,7 +117,7 @@ int ip_forward(struct sk_buff *skb)
 	if (opt->is_strictroute && rt->rt_uses_gateway)
 		goto sr_failed;
 
-	IPCB(skb)->flags |= IPSKB_FORWARDED;
+	IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS;
 	mtu = ip_dst_mtu_maybe_forward(&rt->dst, true);
 	if (ip_exceeds_mtu(skb, mtu)) {
 		IP_INC_STATS(net, IPSTATS_MIB_FRAGFAILS);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e23f141..dde37fb 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -223,8 +223,10 @@ static int ip_finish_output_gso(struct net *net, struct sock *sk,
 	struct sk_buff *segs;
 	int ret = 0;
 
-	/* common case: locally created skb or seglen is <= mtu */
-	if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) ||
+	/* common case: fragmentation of segments is not allowed,
+	 * or seglen is <= mtu
+	 */
+	if (((IPCB(skb)->flags & IPSKB_FRAG_SEGS) == 0) ||
 	      skb_gso_validate_mtu(skb, mtu))
 		return ip_finish_output2(net, sk, skb);
 
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index e0d76f5..eec2341 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1749,7 +1749,7 @@ static void ipmr_queue_xmit(struct net *net, struct mr_table *mrt,
 		vif->dev->stats.tx_bytes += skb->len;
 	}
 
-	IPCB(skb)->flags |= IPSKB_FORWARDED;
+	IPCB(skb)->flags |= IPSKB_FORWARDED | IPSKB_FRAG_SEGS;
 
 	/* RFC1584 teaches, that DVMRP/PIM router must deliver packets locally
 	 * not only before forwarding, but after forwarding on all output
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs
  2016-07-18 11:49 [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso' Shmulik Ladkani
  2016-07-18 11:49 ` [PATCH 1/2] net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags Shmulik Ladkani
@ 2016-07-18 11:49 ` Shmulik Ladkani
  2016-07-19  7:48   ` Hannes Frederic Sowa
  2016-07-18 12:08 ` [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso' Shmulik Ladkani
  2016-07-19 23:40 ` David Miller
  3 siblings, 1 reply; 7+ messages in thread
From: Shmulik Ladkani @ 2016-07-18 11:49 UTC (permalink / raw)
  To: David S . Miller, netdev
  Cc: shmulik.ladkani, Eric Dumazet, shmulik.ladkani,
	Hannes Frederic Sowa, Florian Westphal

Given:
 - tap0 and vxlan0 are bridged
 - vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400)

Assume GSO skbs arriving from tap0 having a gso_size as determined by
user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500).

After encapsulation these skbs have skb_gso_network_seglen that exceed
eth0's ip_skb_dst_mtu.

These skbs are accidentally passed to ip_finish_output2 AS IS.
Alas, each final segment (segmented either by validate_xmit_skb or by
hardware UFO) would be larger than eth0 mtu.
As a result, those above-mtu segments get dropped on certain networks.

This behavior is not aligned with the NON-GSO case:
Assume a non-gso 1500-sized IP packet arrives from tap0. After
encapsulation, the vxlan datagram is fragmented normally at the
ip_finish_output-->ip_fragment code path.

The expected behavior for the GSO case would be segmenting the
"gso-oversized" skb first, then fragmenting each segment according to
dst mtu, and finally passing the resulting fragments to ip_finish_output2.

'ip_finish_output_gso' already supports this "Slowpath" behavior,
according to the IPSKB_FRAG_SEGS flag, which is only set during ipv4
forwarding (not set in the bridged case).

In order to support the bridged case, we'll mark skbs arriving from an
ingress interface that get udp-encaspulated as "allowed to be fragmented",
causing their network_seglen to be validated by 'ip_finish_output_gso'
(and fragment if needed).

Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the
gso and non-gso cases), which serves users wishing to forbid
fragmentation at the udp tunnel endpoint.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 net/ipv4/ip_tunnel_core.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index afd6b59..9d847c3 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -63,6 +63,7 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	int pkt_len = skb->len - skb_inner_network_offset(skb);
 	struct net *net = dev_net(rt->dst.dev);
 	struct net_device *dev = skb->dev;
+	int skb_iif = skb->skb_iif;
 	struct iphdr *iph;
 	int err;
 
@@ -72,6 +73,14 @@ void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
 	skb_dst_set(skb, &rt->dst);
 	memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
 
+	if (skb_iif && proto == IPPROTO_UDP) {
+		/* Arrived from an ingress interface and got udp encapuslated.
+		 * The encapsulated network segment length may exceed dst mtu.
+		 * Allow IP Fragmentation of segments.
+		 */
+		IPCB(skb)->flags |= IPSKB_FRAG_SEGS;
+	}
+
 	/* Push down and install the IP header. */
 	skb_push(skb, sizeof(struct iphdr));
 	skb_reset_network_header(skb);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso'
  2016-07-18 11:49 [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso' Shmulik Ladkani
  2016-07-18 11:49 ` [PATCH 1/2] net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags Shmulik Ladkani
  2016-07-18 11:49 ` [PATCH 2/2] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs Shmulik Ladkani
@ 2016-07-18 12:08 ` Shmulik Ladkani
  2016-07-19 23:40 ` David Miller
  3 siblings, 0 replies; 7+ messages in thread
From: Shmulik Ladkani @ 2016-07-18 12:08 UTC (permalink / raw)
  To: David S . Miller, netdev
  Cc: Eric Dumazet, shmulik.ladkani, Hannes Frederic Sowa, Florian Westphal

This is the v3 of https://patchwork.ozlabs.org/patch/648790/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags
  2016-07-18 11:49 ` [PATCH 1/2] net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags Shmulik Ladkani
@ 2016-07-19  7:42   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Frederic Sowa @ 2016-07-19  7:42 UTC (permalink / raw)
  To: Shmulik Ladkani, David S . Miller, netdev
  Cc: shmulik.ladkani, Eric Dumazet, Florian Westphal

On 18.07.2016 13:49, Shmulik Ladkani wrote:
> This flag indicates whether fragmentation of segments is allowed.
> 
> Formerly this policy was hardcoded according to IPSKB_FORWARDED (set by
> either ip_forward or ipmr_forward).
> 
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Florian Westphal <fw@strlen.de>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs
  2016-07-18 11:49 ` [PATCH 2/2] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs Shmulik Ladkani
@ 2016-07-19  7:48   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 7+ messages in thread
From: Hannes Frederic Sowa @ 2016-07-19  7:48 UTC (permalink / raw)
  To: Shmulik Ladkani, David S . Miller, netdev
  Cc: shmulik.ladkani, Eric Dumazet, Florian Westphal

On 18.07.2016 13:49, Shmulik Ladkani wrote:
> Given:
>  - tap0 and vxlan0 are bridged
>  - vxlan0 stacked on eth0, eth0 having small mtu (e.g. 1400)
> 
> Assume GSO skbs arriving from tap0 having a gso_size as determined by
> user-provided virtio_net_hdr (e.g. 1460 corresponding to VM mtu of 1500).
> 
> After encapsulation these skbs have skb_gso_network_seglen that exceed
> eth0's ip_skb_dst_mtu.
> 
> These skbs are accidentally passed to ip_finish_output2 AS IS.
> Alas, each final segment (segmented either by validate_xmit_skb or by
> hardware UFO) would be larger than eth0 mtu.
> As a result, those above-mtu segments get dropped on certain networks.
> 
> This behavior is not aligned with the NON-GSO case:
> Assume a non-gso 1500-sized IP packet arrives from tap0. After
> encapsulation, the vxlan datagram is fragmented normally at the
> ip_finish_output-->ip_fragment code path.
> 
> The expected behavior for the GSO case would be segmenting the
> "gso-oversized" skb first, then fragmenting each segment according to
> dst mtu, and finally passing the resulting fragments to ip_finish_output2.
> 
> 'ip_finish_output_gso' already supports this "Slowpath" behavior,
> according to the IPSKB_FRAG_SEGS flag, which is only set during ipv4
> forwarding (not set in the bridged case).
> 
> In order to support the bridged case, we'll mark skbs arriving from an
> ingress interface that get udp-encaspulated as "allowed to be fragmented",
> causing their network_seglen to be validated by 'ip_finish_output_gso'
> (and fragment if needed).
> 
> Note the TUNNEL_DONT_FRAGMENT tun_flag is still honoured (both in the
> gso and non-gso cases), which serves users wishing to forbid
> fragmentation at the udp tunnel endpoint.
> 
> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Cc: Florian Westphal <fw@strlen.de>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Given current vxlan behavior regarding fragmentation from non-gso frames
(which we probably can never change), this is in my opinion the best way
forward to make it symmetrical with non-gso. As don't fragment is still
honored after all I don't see any problems with this approach:

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso'
  2016-07-18 11:49 [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso' Shmulik Ladkani
                   ` (2 preceding siblings ...)
  2016-07-18 12:08 ` [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso' Shmulik Ladkani
@ 2016-07-19 23:40 ` David Miller
  3 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2016-07-19 23:40 UTC (permalink / raw)
  To: shmulik.ladkani; +Cc: netdev, shmulik.ladkani, edumazet, hannes, fw

From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Date: Mon, 18 Jul 2016 14:49:32 +0300

> Currently IP fragmentation of GSO segments that exceed dst mtu is
> considered only in the ipv4 forwarding case.
> 
> There are cases where GSO skbs that are bridged and then udp-tunneled
> may have gso_size exceeding the egress device mtu.
> It makes sense to fragment them, as in the non GSOed code path.
> 
> The exact cases where this behavior is needed is described and addressed
> in the 2nd patch.

Series applied, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-07-19 23:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-18 11:49 [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso' Shmulik Ladkani
2016-07-18 11:49 ` [PATCH 1/2] net/ipv4: Introduce IPSKB_FRAG_SEGS bit to inet_skb_parm.flags Shmulik Ladkani
2016-07-19  7:42   ` Hannes Frederic Sowa
2016-07-18 11:49 ` [PATCH 2/2] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs Shmulik Ladkani
2016-07-19  7:48   ` Hannes Frederic Sowa
2016-07-18 12:08 ` [PATCH 0/2] net: Consider fragmentation of udp tunneled skbs in 'ip_finish_output_gso' Shmulik Ladkani
2016-07-19 23:40 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.