All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] ipvs: properly declare tunnel encapsulation
@ 2014-08-01  7:36 Julian Anastasov
  2014-08-25  1:18 ` Simon Horman
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Anastasov @ 2014-08-01  7:36 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, Alex Gartrell, kernel-team

The tunneling method should properly use tunnel encapsulation.
Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
offload is supported.

Thanks to Alex Gartrell for reporting the problem, providing
solution and for all suggestions.

Reported-by: Alex Gartrell <agartrell@fb.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Alex Gartrell <agartrell@fb.com>
---
 net/netfilter/ipvs/ip_vs_xmit.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

I'm not sure if TUN mode worked with HW csum enabled, one with
such hardware can check if the breakage happens after some kernel
version.

Here is what I found for skb->encapsulation and support in drivers

- GRO started to use CHECKSUM_PARTIAL for TCP long time ago

- the skb->encapsulation support is added in 3.8

- BNX2 started to use inner header depending on skb->encapsulation
in 3.10

- i40e appears in 3.12 and started to use inner header depending on
skb->encapsulation

- iptunnel_handle_offloads() is added in 3.13. This patch
uses this function.

- mlx4 started to use inner header depending on skb->encapsulation
in 3.14

- benet started to use inner header depending on skb->encapsulation
in 3.14

As result, I'm not sure that all devices support tunneled TCP/UDP,
I see some drivers supported csum offload (CHECKSUM_PARTIAL) only
if not tunneled. In the future if problem happens with csum
offload we should check if the driver has support for tunneled
TCP/UDP. Otherwise, user can disable the csum offload for device
or as alternative we can add sysctl var in IPVS to call
iptunnel_handle_offloads with csum_help = true.

For now I don't know which stable kernels wihout
iptunnel_handle_offloads() function may need some alternative fix.

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 73ba1cc..5371654 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -38,6 +38,7 @@
 #include <net/route.h>                  /* for ip_route_output */
 #include <net/ipv6.h>
 #include <net/ip6_route.h>
+#include <net/ip_tunnels.h>
 #include <net/addrconf.h>
 #include <linux/icmpv6.h>
 #include <linux/netfilter.h>
@@ -862,11 +863,15 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 		old_iph = ip_hdr(skb);
 	}
 
-	skb->transport_header = skb->network_header;
-
 	/* fix old IP header checksum */
 	ip_send_check(old_iph);
 
+	skb = iptunnel_handle_offloads(skb, false, SKB_GSO_IPIP);
+	if (IS_ERR(skb))
+		goto tx_error;
+
+	skb->transport_header = skb->network_header;
+
 	skb_push(skb, sizeof(struct iphdr));
 	skb_reset_network_header(skb);
 	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
@@ -900,7 +905,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 	return NF_STOLEN;
 
   tx_error:
-	kfree_skb(skb);
+	if (!IS_ERR(skb))
+		kfree_skb(skb);
 	rcu_read_unlock();
 	LeaveFunction(10);
 	return NF_STOLEN;
@@ -953,6 +959,11 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 		old_iph = ipv6_hdr(skb);
 	}
 
+	/* GSO: we need to provide proper SKB_GSO_ value for IPv6 */
+	skb = iptunnel_handle_offloads(skb, false, 0); /* SKB_GSO_SIT/IPV6 */
+	if (IS_ERR(skb))
+		goto tx_error;
+
 	skb->transport_header = skb->network_header;
 
 	skb_push(skb, sizeof(struct ipv6hdr));
@@ -988,7 +999,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 	return NF_STOLEN;
 
 tx_error:
-	kfree_skb(skb);
+	if (!IS_ERR(skb))
+		kfree_skb(skb);
 	rcu_read_unlock();
 	LeaveFunction(10);
 	return NF_STOLEN;
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ipvs: properly declare tunnel encapsulation
  2014-08-01  7:36 [PATCH net] ipvs: properly declare tunnel encapsulation Julian Anastasov
@ 2014-08-25  1:18 ` Simon Horman
  2014-08-25  5:13   ` Julian Anastasov
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Horman @ 2014-08-25  1:18 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, Alex Gartrell, kernel-team

On Fri, Aug 01, 2014 at 10:36:17AM +0300, Julian Anastasov wrote:
> The tunneling method should properly use tunnel encapsulation.
> Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
> offload is supported.
> 
> Thanks to Alex Gartrell for reporting the problem, providing
> solution and for all suggestions.

With this patch is Alex's patch "[PATCH ipvs] ipvs: invoke
skb_checksum_help prior to encapsulation in tunnel xmit " also needed?

> 
> Reported-by: Alex Gartrell <agartrell@fb.com>
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> Signed-off-by: Alex Gartrell <agartrell@fb.com>
> ---
>  net/netfilter/ipvs/ip_vs_xmit.c | 20 ++++++++++++++++----
>  1 file changed, 16 insertions(+), 4 deletions(-)
> 
> I'm not sure if TUN mode worked with HW csum enabled, one with
> such hardware can check if the breakage happens after some kernel
> version.
> 
> Here is what I found for skb->encapsulation and support in drivers
> 
> - GRO started to use CHECKSUM_PARTIAL for TCP long time ago
> 
> - the skb->encapsulation support is added in 3.8
> 
> - BNX2 started to use inner header depending on skb->encapsulation
> in 3.10
> 
> - i40e appears in 3.12 and started to use inner header depending on
> skb->encapsulation
> 
> - iptunnel_handle_offloads() is added in 3.13. This patch
> uses this function.
> 
> - mlx4 started to use inner header depending on skb->encapsulation
> in 3.14
> 
> - benet started to use inner header depending on skb->encapsulation
> in 3.14
> 
> As result, I'm not sure that all devices support tunneled TCP/UDP,
> I see some drivers supported csum offload (CHECKSUM_PARTIAL) only
> if not tunneled. In the future if problem happens with csum
> offload we should check if the driver has support for tunneled
> TCP/UDP. Otherwise, user can disable the csum offload for device
> or as alternative we can add sysctl var in IPVS to call
> iptunnel_handle_offloads with csum_help = true.
> 
> For now I don't know which stable kernels wihout
> iptunnel_handle_offloads() function may need some alternative fix.
> 
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index 73ba1cc..5371654 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -38,6 +38,7 @@
>  #include <net/route.h>                  /* for ip_route_output */
>  #include <net/ipv6.h>
>  #include <net/ip6_route.h>
> +#include <net/ip_tunnels.h>
>  #include <net/addrconf.h>
>  #include <linux/icmpv6.h>
>  #include <linux/netfilter.h>
> @@ -862,11 +863,15 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
>  		old_iph = ip_hdr(skb);
>  	}
>  
> -	skb->transport_header = skb->network_header;
> -
>  	/* fix old IP header checksum */
>  	ip_send_check(old_iph);
>  
> +	skb = iptunnel_handle_offloads(skb, false, SKB_GSO_IPIP);
> +	if (IS_ERR(skb))
> +		goto tx_error;
> +
> +	skb->transport_header = skb->network_header;
> +
>  	skb_push(skb, sizeof(struct iphdr));
>  	skb_reset_network_header(skb);
>  	memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt));
> @@ -900,7 +905,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
>  	return NF_STOLEN;
>  
>    tx_error:
> -	kfree_skb(skb);
> +	if (!IS_ERR(skb))
> +		kfree_skb(skb);
>  	rcu_read_unlock();
>  	LeaveFunction(10);
>  	return NF_STOLEN;
> @@ -953,6 +959,11 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
>  		old_iph = ipv6_hdr(skb);
>  	}
>  
> +	/* GSO: we need to provide proper SKB_GSO_ value for IPv6 */
> +	skb = iptunnel_handle_offloads(skb, false, 0); /* SKB_GSO_SIT/IPV6 */
> +	if (IS_ERR(skb))
> +		goto tx_error;
> +
>  	skb->transport_header = skb->network_header;
>  
>  	skb_push(skb, sizeof(struct ipv6hdr));
> @@ -988,7 +999,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
>  	return NF_STOLEN;
>  
>  tx_error:
> -	kfree_skb(skb);
> +	if (!IS_ERR(skb))
> +		kfree_skb(skb);
>  	rcu_read_unlock();
>  	LeaveFunction(10);
>  	return NF_STOLEN;
> -- 
> 1.9.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ipvs: properly declare tunnel encapsulation
  2014-08-25  1:18 ` Simon Horman
@ 2014-08-25  5:13   ` Julian Anastasov
  2014-08-25  6:49     ` Simon Horman
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Anastasov @ 2014-08-25  5:13 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, Alex Gartrell, kernel-team


	Hello,

On Mon, 25 Aug 2014, Simon Horman wrote:

> On Fri, Aug 01, 2014 at 10:36:17AM +0300, Julian Anastasov wrote:
> > The tunneling method should properly use tunnel encapsulation.
> > Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
> > offload is supported.
> > 
> > Thanks to Alex Gartrell for reporting the problem, providing
> > solution and for all suggestions.
> 
> With this patch is Alex's patch "[PATCH ipvs] ipvs: invoke
> skb_checksum_help prior to encapsulation in tunnel xmit " also needed?

	Not needed, we prefer to avoid csum calculation by CPU.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ipvs: properly declare tunnel encapsulation
  2014-08-25  5:13   ` Julian Anastasov
@ 2014-08-25  6:49     ` Simon Horman
  2014-08-25  8:11       ` Julian Anastasov
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Horman @ 2014-08-25  6:49 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, Alex Gartrell, kernel-team

On Mon, Aug 25, 2014 at 08:13:14AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Mon, 25 Aug 2014, Simon Horman wrote:
> 
> > On Fri, Aug 01, 2014 at 10:36:17AM +0300, Julian Anastasov wrote:
> > > The tunneling method should properly use tunnel encapsulation.
> > > Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
> > > offload is supported.
> > > 
> > > Thanks to Alex Gartrell for reporting the problem, providing
> > > solution and for all suggestions.
> > 
> > With this patch is Alex's patch "[PATCH ipvs] ipvs: invoke
> > skb_checksum_help prior to encapsulation in tunnel xmit " also needed?
> 
> 	Not needed, we prefer to avoid csum calculation by CPU.

Thanks, understood.

I have applied (only) this patch to ipvs-next.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ipvs: properly declare tunnel encapsulation
  2014-08-25  6:49     ` Simon Horman
@ 2014-08-25  8:11       ` Julian Anastasov
  2014-08-27  5:33         ` Simon Horman
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Anastasov @ 2014-08-25  8:11 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, Alex Gartrell, kernel-team


	Hello,

On Mon, 25 Aug 2014, Simon Horman wrote:

> On Mon, Aug 25, 2014 at 08:13:14AM +0300, Julian Anastasov wrote:
> > 
> > 	Hello,
> > 
> > On Mon, 25 Aug 2014, Simon Horman wrote:
> > 
> > > On Fri, Aug 01, 2014 at 10:36:17AM +0300, Julian Anastasov wrote:
> > > > The tunneling method should properly use tunnel encapsulation.
> > > > Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
> > > > offload is supported.
> > > > 
> > > > Thanks to Alex Gartrell for reporting the problem, providing
> > > > solution and for all suggestions.
> > > 
> > > With this patch is Alex's patch "[PATCH ipvs] ipvs: invoke
> > > skb_checksum_help prior to encapsulation in tunnel xmit " also needed?
> > 
> > 	Not needed, we prefer to avoid csum calculation by CPU.
> 
> Thanks, understood.
> 
> I have applied (only) this patch to ipvs-next.

	May be we should apply it as bugfix to ipvs tree.
Later stable kernels may need it. Also, new changes from
Alex Gartrell about tunneling depend on it, they are
net-next material, still in development, so I guess this
patch will appear soon on net-next via next -rc if
applied now as bugfix.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ipvs: properly declare tunnel encapsulation
  2014-08-25  8:11       ` Julian Anastasov
@ 2014-08-27  5:33         ` Simon Horman
  2014-08-27  7:04           ` Julian Anastasov
  0 siblings, 1 reply; 8+ messages in thread
From: Simon Horman @ 2014-08-27  5:33 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, Alex Gartrell, kernel-team

On Mon, Aug 25, 2014 at 11:11:38AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Mon, 25 Aug 2014, Simon Horman wrote:
> 
> > On Mon, Aug 25, 2014 at 08:13:14AM +0300, Julian Anastasov wrote:
> > > 
> > > 	Hello,
> > > 
> > > On Mon, 25 Aug 2014, Simon Horman wrote:
> > > 
> > > > On Fri, Aug 01, 2014 at 10:36:17AM +0300, Julian Anastasov wrote:
> > > > > The tunneling method should properly use tunnel encapsulation.
> > > > > Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
> > > > > offload is supported.
> > > > > 
> > > > > Thanks to Alex Gartrell for reporting the problem, providing
> > > > > solution and for all suggestions.
> > > > 
> > > > With this patch is Alex's patch "[PATCH ipvs] ipvs: invoke
> > > > skb_checksum_help prior to encapsulation in tunnel xmit " also needed?
> > > 
> > > 	Not needed, we prefer to avoid csum calculation by CPU.
> > 
> > Thanks, understood.
> > 
> > I have applied (only) this patch to ipvs-next.
> 
> 	May be we should apply it as bugfix to ipvs tree.
> Later stable kernels may need it. Also, new changes from
> Alex Gartrell about tunneling depend on it, they are
> net-next material, still in development, so I guess this
> patch will appear soon on net-next via next -rc if
> applied now as bugfix.

Sure, I'll do as you suggest.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ipvs: properly declare tunnel encapsulation
  2014-08-27  5:33         ` Simon Horman
@ 2014-08-27  7:04           ` Julian Anastasov
  2014-08-28  1:52             ` Simon Horman
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Anastasov @ 2014-08-27  7:04 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, Alex Gartrell, kernel-team


	Hello,

On Wed, 27 Aug 2014, Simon Horman wrote:

> On Mon, Aug 25, 2014 at 11:11:38AM +0300, Julian Anastasov wrote:
> > 
> > 	Hello,
> > 
> > On Mon, 25 Aug 2014, Simon Horman wrote:
> > 
> > > On Mon, Aug 25, 2014 at 08:13:14AM +0300, Julian Anastasov wrote:
> > > > 
> > > > 	Hello,
> > > > 
> > > > On Mon, 25 Aug 2014, Simon Horman wrote:
> > > > 
> > > > > On Fri, Aug 01, 2014 at 10:36:17AM +0300, Julian Anastasov wrote:
> > > > > > The tunneling method should properly use tunnel encapsulation.
> > > > > > Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
> > > > > > offload is supported.
> > > > > > 
> > > > > > Thanks to Alex Gartrell for reporting the problem, providing
> > > > > > solution and for all suggestions.
> > > > > 
> > > > > With this patch is Alex's patch "[PATCH ipvs] ipvs: invoke
> > > > > skb_checksum_help prior to encapsulation in tunnel xmit " also needed?
> > > > 
> > > > 	Not needed, we prefer to avoid csum calculation by CPU.
> > > 
> > > Thanks, understood.
> > > 
> > > I have applied (only) this patch to ipvs-next.
> > 
> > 	May be we should apply it as bugfix to ipvs tree.
> > Later stable kernels may need it. Also, new changes from
> > Alex Gartrell about tunneling depend on it, they are
> > net-next material, still in development, so I guess this
> > patch will appear soon on net-next via next -rc if
> > applied now as bugfix.
> 
> Sure, I'll do as you suggest.

	Don't forget "ipvs: fix ipv6 hook registration for local replies"
from August 22. It is a bugfix too.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net] ipvs: properly declare tunnel encapsulation
  2014-08-27  7:04           ` Julian Anastasov
@ 2014-08-28  1:52             ` Simon Horman
  0 siblings, 0 replies; 8+ messages in thread
From: Simon Horman @ 2014-08-28  1:52 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: lvs-devel, Alex Gartrell, kernel-team

On Wed, Aug 27, 2014 at 10:04:21AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Wed, 27 Aug 2014, Simon Horman wrote:
> 
> > On Mon, Aug 25, 2014 at 11:11:38AM +0300, Julian Anastasov wrote:
> > > 
> > > 	Hello,
> > > 
> > > On Mon, 25 Aug 2014, Simon Horman wrote:
> > > 
> > > > On Mon, Aug 25, 2014 at 08:13:14AM +0300, Julian Anastasov wrote:
> > > > > 
> > > > > 	Hello,
> > > > > 
> > > > > On Mon, 25 Aug 2014, Simon Horman wrote:
> > > > > 
> > > > > > On Fri, Aug 01, 2014 at 10:36:17AM +0300, Julian Anastasov wrote:
> > > > > > > The tunneling method should properly use tunnel encapsulation.
> > > > > > > Fixes problem with CHECKSUM_PARTIAL packets when TCP/UDP csum
> > > > > > > offload is supported.
> > > > > > > 
> > > > > > > Thanks to Alex Gartrell for reporting the problem, providing
> > > > > > > solution and for all suggestions.
> > > > > > 
> > > > > > With this patch is Alex's patch "[PATCH ipvs] ipvs: invoke
> > > > > > skb_checksum_help prior to encapsulation in tunnel xmit " also needed?
> > > > > 
> > > > > 	Not needed, we prefer to avoid csum calculation by CPU.
> > > > 
> > > > Thanks, understood.
> > > > 
> > > > I have applied (only) this patch to ipvs-next.
> > > 
> > > 	May be we should apply it as bugfix to ipvs tree.
> > > Later stable kernels may need it. Also, new changes from
> > > Alex Gartrell about tunneling depend on it, they are
> > > net-next material, still in development, so I guess this
> > > patch will appear soon on net-next via next -rc if
> > > applied now as bugfix.
> > 
> > Sure, I'll do as you suggest.
> 
> 	Don't forget "ipvs: fix ipv6 hook registration for local replies"
> from August 22. It is a bugfix too.

Thanks, and sorry for letting that slip through the cracks.
I'll queue it up ASAP.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-08-28  1:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-01  7:36 [PATCH net] ipvs: properly declare tunnel encapsulation Julian Anastasov
2014-08-25  1:18 ` Simon Horman
2014-08-25  5:13   ` Julian Anastasov
2014-08-25  6:49     ` Simon Horman
2014-08-25  8:11       ` Julian Anastasov
2014-08-27  5:33         ` Simon Horman
2014-08-27  7:04           ` Julian Anastasov
2014-08-28  1:52             ` Simon Horman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.