netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net v2 0/4] net: clean up interactions of CHECKSUM_PARTIAL and fragmentation
@ 2015-10-27 15:02 Hannes Frederic Sowa
  2015-10-27 15:02 ` [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 15:02 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, vyasevich, bcodding, tom, Hannes Frederic Sowa

This series fixes wrong checksums on the wire for IPv4 and IPv6. Large
send buffers and especially NFS lead to wrong checksums in both IPv4
and IPv6.

CHECKSUM_PARTIAL skbs should not receive the respective fragmentations
functions, so we add WARN_ON_ONCE to those functions to fix up those as
soon as they get reported.

Thanks!

Hannes Frederic Sowa (4):
  ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
  ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment

 net/ipv4/ip_output.c  |  9 ++++--
 net/ipv6/ip6_output.c | 78 ++++++++++++++++++++++++---------------------------
 2 files changed, 43 insertions(+), 44 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 15:02 [PATCH net v2 0/4] net: clean up interactions of CHECKSUM_PARTIAL and fragmentation Hannes Frederic Sowa
@ 2015-10-27 15:02 ` Hannes Frederic Sowa
  2015-10-27 16:04   ` Tom Herbert
  2015-10-27 15:02 ` [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 15:02 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, vyasevich, bcodding, tom, Hannes Frederic Sowa

We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/ip_output.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 50e2973..0b02417 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -911,6 +911,7 @@ static int __ip_append_data(struct sock *sk,
 	if (transhdrlen &&
 	    length + fragheaderlen <= mtu &&
 	    rt->dst.dev->features & NETIF_F_V4_CSUM &&
+	    !(flags & MSG_MORE) &&
 	    !exthdrlen)
 		csummode = CHECKSUM_PARTIAL;
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
  2015-10-27 15:02 [PATCH net v2 0/4] net: clean up interactions of CHECKSUM_PARTIAL and fragmentation Hannes Frederic Sowa
  2015-10-27 15:02 ` [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
@ 2015-10-27 15:02 ` Hannes Frederic Sowa
  2015-10-27 16:06   ` Tom Herbert
                     ` (2 more replies)
  2015-10-27 15:02 ` [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
  2015-10-27 15:02 ` [PATCH net v2 4/4] ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
  3 siblings, 3 replies; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 15:02 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, vyasevich, bcodding, tom, Hannes Frederic Sowa

CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/ip_output.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 0b02417..3f94a3b 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -533,6 +533,11 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 
 	dev = rt->dst.dev;
 
+	/* for offloaded checksums cleanup checksum before fragmentation */
+	if (WARN_ON_ONCE(skb->ip_summed == CHECKSUM_PARTIAL) &&
+	    (err = skb_checksum_help(skb)))
+		goto fail;
+
 	/*
 	 *	Point into the IP datagram header.
 	 */
@@ -657,9 +662,6 @@ slow_path_clean:
 	}
 
 slow_path:
-	/* for offloaded checksums cleanup checksum before fragmentation */
-	if ((skb->ip_summed == CHECKSUM_PARTIAL) && skb_checksum_help(skb))
-		goto fail;
 	iph = ip_hdr(skb);
 
 	left = skb->len - hlen;		/* Space per frame */
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 15:02 [PATCH net v2 0/4] net: clean up interactions of CHECKSUM_PARTIAL and fragmentation Hannes Frederic Sowa
  2015-10-27 15:02 ` [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
  2015-10-27 15:02 ` [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
@ 2015-10-27 15:02 ` Hannes Frederic Sowa
  2015-10-27 16:36   ` Tom Herbert
  2015-10-27 15:02 ` [PATCH net v2 4/4] ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
  3 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 15:02 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, vyasevich, bcodding, tom, Hannes Frederic Sowa

We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in
the existence of IPv6 extension headers, but the condition was wrong. Fix
it up, too. Also the condition to check whether the packet fits into
one fragment was wrong and has been corrected.

Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/ip6_output.c | 70 ++++++++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 37 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index c265068..9828a71 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1272,6 +1272,7 @@ static int __ip6_append_data(struct sock *sk,
 	struct rt6_info *rt = (struct rt6_info *)cork->dst;
 	struct ipv6_txoptions *opt = v6_cork->opt;
 	int csummode = CHECKSUM_NONE;
+	unsigned int maxnonfragsize, headersize;
 
 	skb = skb_peek_tail(queue);
 	if (!skb) {
@@ -1289,38 +1290,43 @@ static int __ip6_append_data(struct sock *sk,
 	maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
 		     sizeof(struct frag_hdr);
 
-	if (mtu <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) {
-		unsigned int maxnonfragsize, headersize;
-
-		headersize = sizeof(struct ipv6hdr) +
-			     (opt ? opt->opt_flen + opt->opt_nflen : 0) +
-			     (dst_allfrag(&rt->dst) ?
-			      sizeof(struct frag_hdr) : 0) +
-			     rt->rt6i_nfheader_len;
-
-		if (ip6_sk_ignore_df(sk))
-			maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
-		else
-			maxnonfragsize = mtu;
+	headersize = sizeof(struct ipv6hdr) +
+		     (opt ? opt->opt_flen + opt->opt_nflen : 0) +
+		     (dst_allfrag(&rt->dst) ?
+		      sizeof(struct frag_hdr) : 0) +
+		     rt->rt6i_nfheader_len;
+
+	if (cork->length + length > mtu - headersize && dontfrag &&
+	    (sk->sk_protocol == IPPROTO_UDP ||
+	     sk->sk_protocol == IPPROTO_RAW)) {
+		ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
+				sizeof(struct ipv6hdr));
+		goto emsgsize;
+	}
 
-		/* dontfrag active */
-		if ((cork->length + length > mtu - headersize) && dontfrag &&
-		    (sk->sk_protocol == IPPROTO_UDP ||
-		     sk->sk_protocol == IPPROTO_RAW)) {
-			ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
-						   sizeof(struct ipv6hdr));
-			goto emsgsize;
-		}
+	if (ip6_sk_ignore_df(sk))
+		maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
+	else
+		maxnonfragsize = mtu;
 
-		if (cork->length + length > maxnonfragsize - headersize) {
+	if (cork->length + length > maxnonfragsize - headersize) {
 emsgsize:
-			ipv6_local_error(sk, EMSGSIZE, fl6,
-					 mtu - headersize +
-					 sizeof(struct ipv6hdr));
-			return -EMSGSIZE;
-		}
+		ipv6_local_error(sk, EMSGSIZE, fl6,
+				 mtu - headersize +
+				 sizeof(struct ipv6hdr));
+		return -EMSGSIZE;
 	}
 
+	/* CHECKSUM_PARTIAL only with no extension headers and when
+	 * we are not going to fragment
+	 */
+	if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
+	    headersize == sizeof(struct ipv6hdr) &&
+	    length < mtu - headersize &&
+	    !(flags & MSG_MORE) &&
+	    rt->dst.dev->features & NETIF_F_V6_CSUM)
+		csummode = CHECKSUM_PARTIAL;
+
 	if (sk->sk_type == SOCK_DGRAM || sk->sk_type == SOCK_RAW) {
 		sock_tx_timestamp(sk, &tx_flags);
 		if (tx_flags & SKBTX_ANY_SW_TSTAMP &&
@@ -1328,16 +1334,6 @@ emsgsize:
 			tskey = sk->sk_tskey++;
 	}
 
-	/* If this is the first and only packet and device
-	 * supports checksum offloading, let's use it.
-	 * Use transhdrlen, same as IPv4, because partial
-	 * sums only work when transhdrlen is set.
-	 */
-	if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
-	    length + fragheaderlen < mtu &&
-	    rt->dst.dev->features & NETIF_F_V6_CSUM &&
-	    !exthdrlen)
-		csummode = CHECKSUM_PARTIAL;
 	/*
 	 * Let's try using as much space as possible.
 	 * Use MTU if total length of the message fits into the MTU.
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH net v2 4/4] ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
  2015-10-27 15:02 [PATCH net v2 0/4] net: clean up interactions of CHECKSUM_PARTIAL and fragmentation Hannes Frederic Sowa
                   ` (2 preceding siblings ...)
  2015-10-27 15:02 ` [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
@ 2015-10-27 15:02 ` Hannes Frederic Sowa
  3 siblings, 0 replies; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 15:02 UTC (permalink / raw)
  To: netdev; +Cc: edumazet, vyasevich, bcodding, tom, Hannes Frederic Sowa

CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv6/ip6_output.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9828a71..c6c5630 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -605,6 +605,10 @@ int ip6_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
 	frag_id = ipv6_select_ident(net, &ipv6_hdr(skb)->daddr,
 				    &ipv6_hdr(skb)->saddr);
 
+	if (WARN_ON_ONCE(skb->ip_summed == CHECKSUM_PARTIAL) &&
+	    (err = skb_checksum_help(skb)))
+		goto fail;
+
 	hroom = LL_RESERVED_SPACE(rt->dst.dev);
 	if (skb_has_frag_list(skb)) {
 		int first_len = skb_pagelen(skb);
@@ -733,10 +737,6 @@ slow_path_clean:
 	}
 
 slow_path:
-	if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
-	    skb_checksum_help(skb))
-		goto fail;
-
 	left = skb->len - hlen;		/* Space per frame */
 	ptr = hlen;			/* Where to start from */
 
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 15:02 ` [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
@ 2015-10-27 16:04   ` Tom Herbert
  2015-10-27 16:34     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 23+ messages in thread
From: Tom Herbert @ 2015-10-27 16:04 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> We cannot reliable calculate packet size on MSG_MORE corked sockets
> and thus cannot decide if they are going to be fragmented later on,
> so better not use CHECKSUM_PARTIAL in the first place.
>
MSG_MORE should be independent of checksum offload. If packet is
fragmented the fix in ip_output will ensure that skb_checksum_help is
properly called.

> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  net/ipv4/ip_output.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 50e2973..0b02417 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -911,6 +911,7 @@ static int __ip_append_data(struct sock *sk,
>         if (transhdrlen &&
>             length + fragheaderlen <= mtu &&
>             rt->dst.dev->features & NETIF_F_V4_CSUM &&
> +           !(flags & MSG_MORE) &&
>             !exthdrlen)
>                 csummode = CHECKSUM_PARTIAL;
>
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
  2015-10-27 15:02 ` [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
@ 2015-10-27 16:06   ` Tom Herbert
  2015-10-27 18:30     ` Hannes Frederic Sowa
  2015-10-27 18:46   ` Tom Herbert
  2015-10-27 19:01   ` Sergei Shtylyov
  2 siblings, 1 reply; 23+ messages in thread
From: Tom Herbert @ 2015-10-27 16:06 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
> of those warn about them once and handle them gracefully by recalculating
> the checksum.
>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  net/ipv4/ip_output.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 0b02417..3f94a3b 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -533,6 +533,11 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>
>         dev = rt->dst.dev;
>
> +       /* for offloaded checksums cleanup checksum before fragmentation */
> +       if (WARN_ON_ONCE(skb->ip_summed == CHECKSUM_PARTIAL) &&
> +           (err = skb_checksum_help(skb)))
> +               goto fail;
> +
Why the WARN_ON_ONCE? Is there a prior check somewhere that avoid this
condition?

>         /*
>          *      Point into the IP datagram header.
>          */
> @@ -657,9 +662,6 @@ slow_path_clean:
>         }
>
>  slow_path:
> -       /* for offloaded checksums cleanup checksum before fragmentation */
> -       if ((skb->ip_summed == CHECKSUM_PARTIAL) && skb_checksum_help(skb))
> -               goto fail;
>         iph = ip_hdr(skb);
>
>         left = skb->len - hlen;         /* Space per frame */
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 16:04   ` Tom Herbert
@ 2015-10-27 16:34     ` Hannes Frederic Sowa
  2015-10-27 16:41       ` Tom Herbert
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 16:34 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015, at 17:04, Tom Herbert wrote:
> On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > We cannot reliable calculate packet size on MSG_MORE corked sockets
> > and thus cannot decide if they are going to be fragmented later on,
> > so better not use CHECKSUM_PARTIAL in the first place.
> >
> MSG_MORE should be independent of checksum offload. If packet is
> fragmented the fix in ip_output will ensure that skb_checksum_help is
> properly called.

The probability is that we are going to fragment if MSG_MORE is set,
because exceeding link mtu is quite probable, see e.g. NFS use case. Why
not simply use the csum functions during copy-in in that case? It makes
much more sense to me.

I don't see a reason to test for fragment length at all, then.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 15:02 ` [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
@ 2015-10-27 16:36   ` Tom Herbert
  2015-10-27 16:44     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 23+ messages in thread
From: Tom Herbert @ 2015-10-27 16:36 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> We cannot reliable calculate packet size on MSG_MORE corked sockets
> and thus cannot decide if they are going to be fragmented later on,
> so better not use CHECKSUM_PARTIAL in the first place.
>
> The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in
> the existence of IPv6 extension headers, but the condition was wrong. Fix
> it up, too. Also the condition to check whether the packet fits into
> one fragment was wrong and has been corrected.
>
> Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets")
> See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  net/ipv6/ip6_output.c | 70 ++++++++++++++++++++++++---------------------------
>  1 file changed, 33 insertions(+), 37 deletions(-)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index c265068..9828a71 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1272,6 +1272,7 @@ static int __ip6_append_data(struct sock *sk,
>         struct rt6_info *rt = (struct rt6_info *)cork->dst;
>         struct ipv6_txoptions *opt = v6_cork->opt;
>         int csummode = CHECKSUM_NONE;
> +       unsigned int maxnonfragsize, headersize;
>
>         skb = skb_peek_tail(queue);
>         if (!skb) {
> @@ -1289,38 +1290,43 @@ static int __ip6_append_data(struct sock *sk,
>         maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
>                      sizeof(struct frag_hdr);
>
> -       if (mtu <= sizeof(struct ipv6hdr) + IPV6_MAXPLEN) {

It seems like most of the diffs in this patch are a result of
eliminating this line. Is this check not needed?

> -               unsigned int maxnonfragsize, headersize;
> -
> -               headersize = sizeof(struct ipv6hdr) +
> -                            (opt ? opt->opt_flen + opt->opt_nflen : 0) +
> -                            (dst_allfrag(&rt->dst) ?
> -                             sizeof(struct frag_hdr) : 0) +
> -                            rt->rt6i_nfheader_len;
> -
> -               if (ip6_sk_ignore_df(sk))
> -                       maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
> -               else
> -                       maxnonfragsize = mtu;
> +       headersize = sizeof(struct ipv6hdr) +
> +                    (opt ? opt->opt_flen + opt->opt_nflen : 0) +
> +                    (dst_allfrag(&rt->dst) ?
> +                     sizeof(struct frag_hdr) : 0) +
> +                    rt->rt6i_nfheader_len;
> +
> +       if (cork->length + length > mtu - headersize && dontfrag &&
> +           (sk->sk_protocol == IPPROTO_UDP ||
> +            sk->sk_protocol == IPPROTO_RAW)) {
> +               ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
> +                               sizeof(struct ipv6hdr));
> +               goto emsgsize;
> +       }
>
> -               /* dontfrag active */
> -               if ((cork->length + length > mtu - headersize) && dontfrag &&
> -                   (sk->sk_protocol == IPPROTO_UDP ||
> -                    sk->sk_protocol == IPPROTO_RAW)) {
> -                       ipv6_local_rxpmtu(sk, fl6, mtu - headersize +
> -                                                  sizeof(struct ipv6hdr));
> -                       goto emsgsize;
> -               }
> +       if (ip6_sk_ignore_df(sk))
> +               maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN;
> +       else
> +               maxnonfragsize = mtu;
>
> -               if (cork->length + length > maxnonfragsize - headersize) {
> +       if (cork->length + length > maxnonfragsize - headersize) {
>  emsgsize:
> -                       ipv6_local_error(sk, EMSGSIZE, fl6,
> -                                        mtu - headersize +
> -                                        sizeof(struct ipv6hdr));
> -                       return -EMSGSIZE;
> -               }
> +               ipv6_local_error(sk, EMSGSIZE, fl6,
> +                                mtu - headersize +
> +                                sizeof(struct ipv6hdr));
> +               return -EMSGSIZE;
>         }
>
> +       /* CHECKSUM_PARTIAL only with no extension headers and when

No, please don't do this. CHECKSUM_PARTIAL should work with extension
headers as defined, so this is just disabling otherwise valid and
useful functionality. If (some) drivers have problems with this they
need to be identified and fixed.

> +        * we are not going to fragment
> +        */
> +       if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
> +           headersize == sizeof(struct ipv6hdr) &&
> +           length < mtu - headersize &&
> +           !(flags & MSG_MORE) &&
> +           rt->dst.dev->features & NETIF_F_V6_CSUM)
> +               csummode = CHECKSUM_PARTIAL;
> +
>         if (sk->sk_type == SOCK_DGRAM || sk->sk_type == SOCK_RAW) {
>                 sock_tx_timestamp(sk, &tx_flags);
>                 if (tx_flags & SKBTX_ANY_SW_TSTAMP &&
> @@ -1328,16 +1334,6 @@ emsgsize:
>                         tskey = sk->sk_tskey++;
>         }
>
> -       /* If this is the first and only packet and device
> -        * supports checksum offloading, let's use it.
> -        * Use transhdrlen, same as IPv4, because partial
> -        * sums only work when transhdrlen is set.
> -        */
> -       if (transhdrlen && sk->sk_protocol == IPPROTO_UDP &&
> -           length + fragheaderlen < mtu &&
> -           rt->dst.dev->features & NETIF_F_V6_CSUM &&
> -           !exthdrlen)
> -               csummode = CHECKSUM_PARTIAL;
>         /*
>          * Let's try using as much space as possible.
>          * Use MTU if total length of the message fits into the MTU.
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 16:34     ` Hannes Frederic Sowa
@ 2015-10-27 16:41       ` Tom Herbert
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Herbert @ 2015-10-27 16:41 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 9:34 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Oct 27, 2015, at 17:04, Tom Herbert wrote:
>> On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > We cannot reliable calculate packet size on MSG_MORE corked sockets
>> > and thus cannot decide if they are going to be fragmented later on,
>> > so better not use CHECKSUM_PARTIAL in the first place.
>> >
>> MSG_MORE should be independent of checksum offload. If packet is
>> fragmented the fix in ip_output will ensure that skb_checksum_help is
>> properly called.
>
> The probability is that we are going to fragment if MSG_MORE is set,
> because exceeding link mtu is quite probable, see e.g. NFS use case. Why
> not simply use the csum functions during copy-in in that case? It makes
> much more sense to me.
>
For datagram sockets MSG_MORE means that more datagrams will be sent,
it's not used to incrementally add data to a datagram already queued
(SEQPACKET with EOR is for that).

> I don't see a reason to test for fragment length at all, then.
>
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 16:36   ` Tom Herbert
@ 2015-10-27 16:44     ` Hannes Frederic Sowa
  2015-10-27 17:32       ` Tom Herbert
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 16:44 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington



On Tue, Oct 27, 2015, at 17:36, Tom Herbert wrote:> > -               if
(cork->length + length > maxnonfragsize - headersize) {
> > +       if (cork->length + length > maxnonfragsize - headersize) {
> >  emsgsize:
> > -                       ipv6_local_error(sk, EMSGSIZE, fl6,
> > -                                        mtu - headersize +
> > -                                        sizeof(struct ipv6hdr));
> > -                       return -EMSGSIZE;
> > -               }
> > +               ipv6_local_error(sk, EMSGSIZE, fl6,
> > +                                mtu - headersize +
> > +                                sizeof(struct ipv6hdr));
> > +               return -EMSGSIZE;
> >         }
> >
> > +       /* CHECKSUM_PARTIAL only with no extension headers and when
> 
> No, please don't do this. CHECKSUM_PARTIAL should work with extension
> headers as defined, so this is just disabling otherwise valid and
> useful functionality. If (some) drivers have problems with this they
> need to be identified and fixed.

I don't understand. The old code already didn't allow the use of
opt_flen with CHECKSUM_PARTIAL.

The MSG_MORE check has nothing to do with that but only with corking.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 16:44     ` Hannes Frederic Sowa
@ 2015-10-27 17:32       ` Tom Herbert
  2015-10-27 18:29         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 23+ messages in thread
From: Tom Herbert @ 2015-10-27 17:32 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 9:44 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
>
>
> On Tue, Oct 27, 2015, at 17:36, Tom Herbert wrote:> > -               if
> (cork->length + length > maxnonfragsize - headersize) {
>> > +       if (cork->length + length > maxnonfragsize - headersize) {
>> >  emsgsize:
>> > -                       ipv6_local_error(sk, EMSGSIZE, fl6,
>> > -                                        mtu - headersize +
>> > -                                        sizeof(struct ipv6hdr));
>> > -                       return -EMSGSIZE;
>> > -               }
>> > +               ipv6_local_error(sk, EMSGSIZE, fl6,
>> > +                                mtu - headersize +
>> > +                                sizeof(struct ipv6hdr));
>> > +               return -EMSGSIZE;
>> >         }
>> >
>> > +       /* CHECKSUM_PARTIAL only with no extension headers and when
>>
>> No, please don't do this. CHECKSUM_PARTIAL should work with extension
>> headers as defined, so this is just disabling otherwise valid and
>> useful functionality. If (some) drivers have problems with this they
>> need to be identified and fixed.
>
> I don't understand. The old code already didn't allow the use of
> opt_flen with CHECKSUM_PARTIAL.
>
Then that's a problem with the old code :-). Is there any other reason
that we can't use CHECKSUM_PARTIAL with extension headers other than
lack of correct driver support?

> The MSG_MORE check has nothing to do with that but only with corking.
>
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 17:32       ` Tom Herbert
@ 2015-10-27 18:29         ` Hannes Frederic Sowa
  2015-10-27 18:37           ` Tom Herbert
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 18:29 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015, at 18:32, Tom Herbert wrote:
> On Tue, Oct 27, 2015 at 9:44 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> >
> >
> > On Tue, Oct 27, 2015, at 17:36, Tom Herbert wrote:> > -               if
> > (cork->length + length > maxnonfragsize - headersize) {
> >> > +       if (cork->length + length > maxnonfragsize - headersize) {
> >> >  emsgsize:
> >> > -                       ipv6_local_error(sk, EMSGSIZE, fl6,
> >> > -                                        mtu - headersize +
> >> > -                                        sizeof(struct ipv6hdr));
> >> > -                       return -EMSGSIZE;
> >> > -               }
> >> > +               ipv6_local_error(sk, EMSGSIZE, fl6,
> >> > +                                mtu - headersize +
> >> > +                                sizeof(struct ipv6hdr));
> >> > +               return -EMSGSIZE;
> >> >         }
> >> >
> >> > +       /* CHECKSUM_PARTIAL only with no extension headers and when
> >>
> >> No, please don't do this. CHECKSUM_PARTIAL should work with extension
> >> headers as defined, so this is just disabling otherwise valid and
> >> useful functionality. If (some) drivers have problems with this they
> >> need to be identified and fixed.
> >
> > I don't understand. The old code already didn't allow the use of
> > opt_flen with CHECKSUM_PARTIAL.
> >
> Then that's a problem with the old code :-). Is there any other reason
> that we can't use CHECKSUM_PARTIAL with extension headers other than
> lack of correct driver support?

The lack of correct driver support is a big bumper, but as I wrote, I
don't see a reason to not lift this restriction in net-next. I proposed
a new feature flag, or by looking at your series, we could probably use
the extension header okay field for that.

I would be conservative in net though.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
  2015-10-27 16:06   ` Tom Herbert
@ 2015-10-27 18:30     ` Hannes Frederic Sowa
  0 siblings, 0 replies; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 18:30 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015, at 17:06, Tom Herbert wrote:
> On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
> > of those warn about them once and handle them gracefully by recalculating
> > the checksum.
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Vlad Yasevich <vyasevich@gmail.com>
> > Cc: Benjamin Coddington <bcodding@redhat.com>
> > Cc: Tom Herbert <tom@herbertland.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> >  net/ipv4/ip_output.c | 8 +++++---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 0b02417..3f94a3b 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -533,6 +533,11 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
> >
> >         dev = rt->dst.dev;
> >
> > +       /* for offloaded checksums cleanup checksum before fragmentation */
> > +       if (WARN_ON_ONCE(skb->ip_summed == CHECKSUM_PARTIAL) &&
> > +           (err = skb_checksum_help(skb)))
> > +               goto fail;
> > +
> Why the WARN_ON_ONCE? Is there a prior check somewhere that avoid this
> condition?

While I am pretty sure we should not hit the condition in IPv6 anymore,
I think this could frighten people in IPv4 land. I will repost without
the WARN_ON_ONCE. Maybe it makes sense to use the IFF_DEBUG interface
flags again? :)

I will repost without those WARN_ON_ONCEs.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 18:29         ` Hannes Frederic Sowa
@ 2015-10-27 18:37           ` Tom Herbert
  2015-10-27 19:19             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 23+ messages in thread
From: Tom Herbert @ 2015-10-27 18:37 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 11:29 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Oct 27, 2015, at 18:32, Tom Herbert wrote:
>> On Tue, Oct 27, 2015 at 9:44 AM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> >
>> >
>> > On Tue, Oct 27, 2015, at 17:36, Tom Herbert wrote:> > -               if
>> > (cork->length + length > maxnonfragsize - headersize) {
>> >> > +       if (cork->length + length > maxnonfragsize - headersize) {
>> >> >  emsgsize:
>> >> > -                       ipv6_local_error(sk, EMSGSIZE, fl6,
>> >> > -                                        mtu - headersize +
>> >> > -                                        sizeof(struct ipv6hdr));
>> >> > -                       return -EMSGSIZE;
>> >> > -               }
>> >> > +               ipv6_local_error(sk, EMSGSIZE, fl6,
>> >> > +                                mtu - headersize +
>> >> > +                                sizeof(struct ipv6hdr));
>> >> > +               return -EMSGSIZE;
>> >> >         }
>> >> >
>> >> > +       /* CHECKSUM_PARTIAL only with no extension headers and when
>> >>
>> >> No, please don't do this. CHECKSUM_PARTIAL should work with extension
>> >> headers as defined, so this is just disabling otherwise valid and
>> >> useful functionality. If (some) drivers have problems with this they
>> >> need to be identified and fixed.
>> >
>> > I don't understand. The old code already didn't allow the use of
>> > opt_flen with CHECKSUM_PARTIAL.
>> >
>> Then that's a problem with the old code :-). Is there any other reason
>> that we can't use CHECKSUM_PARTIAL with extension headers other than
>> lack of correct driver support?
>
> The lack of correct driver support is a big bumper, but as I wrote, I
> don't see a reason to not lift this restriction in net-next. I proposed
> a new feature flag, or by looking at your series, we could probably use
> the extension header okay field for that.
>
Okay, but why bother doing this for net? This problem has obviously
existed for a while, and even if the restriction is maintained here
there are still other paths that don't go through ip_append_data that
could trip the bug. Also, drivers are welcome to fix their issues in
net I believe.

> I would be conservative in net though.
>
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
  2015-10-27 15:02 ` [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
  2015-10-27 16:06   ` Tom Herbert
@ 2015-10-27 18:46   ` Tom Herbert
  2015-10-27 19:01   ` Sergei Shtylyov
  2 siblings, 0 replies; 23+ messages in thread
From: Tom Herbert @ 2015-10-27 18:46 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 8:02 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
> of those warn about them once and handle them gracefully by recalculating
> the checksum.
>
I believe a UDP sender within the kernel (like an encapsulation) that
happens to send using a frag list that exceeds MTU is quite possible
and would be a problem with current code.

> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>  net/ipv4/ip_output.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 0b02417..3f94a3b 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -533,6 +533,11 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>
>         dev = rt->dst.dev;
>
> +       /* for offloaded checksums cleanup checksum before fragmentation */
> +       if (WARN_ON_ONCE(skb->ip_summed == CHECKSUM_PARTIAL) &&
> +           (err = skb_checksum_help(skb)))
> +               goto fail;
> +
>         /*
>          *      Point into the IP datagram header.
>          */
> @@ -657,9 +662,6 @@ slow_path_clean:
>         }
>
>  slow_path:
> -       /* for offloaded checksums cleanup checksum before fragmentation */
> -       if ((skb->ip_summed == CHECKSUM_PARTIAL) && skb_checksum_help(skb))
> -               goto fail;
>         iph = ip_hdr(skb);
>
>         left = skb->len - hlen;         /* Space per frame */
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
  2015-10-27 15:02 ` [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
  2015-10-27 16:06   ` Tom Herbert
  2015-10-27 18:46   ` Tom Herbert
@ 2015-10-27 19:01   ` Sergei Shtylyov
  2015-10-27 19:15     ` Hannes Frederic Sowa
  2 siblings, 1 reply; 23+ messages in thread
From: Sergei Shtylyov @ 2015-10-27 19:01 UTC (permalink / raw)
  To: Hannes Frederic Sowa, netdev; +Cc: edumazet, vyasevich, bcodding, tom

Hello.

On 10/27/2015 06:02 PM, Hannes Frederic Sowa wrote:

> CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
> of those warn about them once and handle them gracefully by recalculating
> the checksum.
>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Benjamin Coddington <bcodding@redhat.com>
> Cc: Tom Herbert <tom@herbertland.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
>   net/ipv4/ip_output.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 0b02417..3f94a3b 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -533,6 +533,11 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
>
>   	dev = rt->dst.dev;
>
> +	/* for offloaded checksums cleanup checksum before fragmentation */
> +	if (WARN_ON_ONCE(skb->ip_summed == CHECKSUM_PARTIAL) &&
> +	    (err = skb_checksum_help(skb)))

    scripts/checkpatch.pl shou;d have complained about using = in the *if* 
expression.

> +		goto fail;
> +
>   	/*
>   	 *	Point into the IP datagram header.
>   	 */
[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
  2015-10-27 19:01   ` Sergei Shtylyov
@ 2015-10-27 19:15     ` Hannes Frederic Sowa
  0 siblings, 0 replies; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 19:15 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: edumazet, vyasevich, bcodding, tom

Hi Sergei,

On Tue, Oct 27, 2015, at 20:01, Sergei Shtylyov wrote:
> On 10/27/2015 06:02 PM, Hannes Frederic Sowa wrote:
> 
> > CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
> > of those warn about them once and handle them gracefully by recalculating
> > the checksum.
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Vlad Yasevich <vyasevich@gmail.com>
> > Cc: Benjamin Coddington <bcodding@redhat.com>
> > Cc: Tom Herbert <tom@herbertland.com>
> > Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> > ---
> >   net/ipv4/ip_output.c | 8 +++++---
> >   1 file changed, 5 insertions(+), 3 deletions(-)
> >
> > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> > index 0b02417..3f94a3b 100644
> > --- a/net/ipv4/ip_output.c
> > +++ b/net/ipv4/ip_output.c
> > @@ -533,6 +533,11 @@ int ip_do_fragment(struct net *net, struct sock *sk, struct sk_buff *skb,
> >
> >   	dev = rt->dst.dev;
> >
> > +	/* for offloaded checksums cleanup checksum before fragmentation */
> > +	if (WARN_ON_ONCE(skb->ip_summed == CHECKSUM_PARTIAL) &&
> > +	    (err = skb_checksum_help(skb)))
> 
>     scripts/checkpatch.pl shou;d have complained about using = in the
>     *if* 
> expression.

I know and I ignored it deliberately because I found it nicer this way.
I made sure gcc does not complain by using extra braces around the
assignment.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 18:37           ` Tom Herbert
@ 2015-10-27 19:19             ` Hannes Frederic Sowa
  2015-10-27 21:42               ` Hannes Frederic Sowa
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 19:19 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015, at 19:37, Tom Herbert wrote:
> On Tue, Oct 27, 2015 at 11:29 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > On Tue, Oct 27, 2015, at 18:32, Tom Herbert wrote:
> >> On Tue, Oct 27, 2015 at 9:44 AM, Hannes Frederic Sowa
> >> <hannes@stressinduktion.org> wrote:
> >> >
> >> >
> >> > On Tue, Oct 27, 2015, at 17:36, Tom Herbert wrote:> > -               if
> >> > (cork->length + length > maxnonfragsize - headersize) {
> >> >> > +       if (cork->length + length > maxnonfragsize - headersize) {
> >> >> >  emsgsize:
> >> >> > -                       ipv6_local_error(sk, EMSGSIZE, fl6,
> >> >> > -                                        mtu - headersize +
> >> >> > -                                        sizeof(struct ipv6hdr));
> >> >> > -                       return -EMSGSIZE;
> >> >> > -               }
> >> >> > +               ipv6_local_error(sk, EMSGSIZE, fl6,
> >> >> > +                                mtu - headersize +
> >> >> > +                                sizeof(struct ipv6hdr));
> >> >> > +               return -EMSGSIZE;
> >> >> >         }
> >> >> >
> >> >> > +       /* CHECKSUM_PARTIAL only with no extension headers and when
> >> >>
> >> >> No, please don't do this. CHECKSUM_PARTIAL should work with extension
> >> >> headers as defined, so this is just disabling otherwise valid and
> >> >> useful functionality. If (some) drivers have problems with this they
> >> >> need to be identified and fixed.
> >> >
> >> > I don't understand. The old code already didn't allow the use of
> >> > opt_flen with CHECKSUM_PARTIAL.
> >> >
> >> Then that's a problem with the old code :-). Is there any other reason
> >> that we can't use CHECKSUM_PARTIAL with extension headers other than
> >> lack of correct driver support?
> >
> > The lack of correct driver support is a big bumper, but as I wrote, I
> > don't see a reason to not lift this restriction in net-next. I proposed
> > a new feature flag, or by looking at your series, we could probably use
> > the extension header okay field for that.
> >
> Okay, but why bother doing this for net? This problem has obviously
> existed for a while, and even if the restriction is maintained here
> there are still other paths that don't go through ip_append_data that
> could trip the bug. Also, drivers are welcome to fix their issues in
> net I believe.

I even don't know if it could be a hardware issue. Also I don't want to
break people's communication with a patch.
IMHO without the WARN_ON_ONCEs, which I agreed to remove, I currently
don't see any problem for net.

You don't agree on a netdev-feature flag, indicating the driver is okay
with hardware checksumming and extension headers? We could add this to
net-next pretty fast, I think. It does not require people to revert this
patch in case their driver misbehaves and we don't get a fix for it,
soon. Also what should we do if the driver simply does not support
extension headers + checksum offloading? Completely kill checksum
offloading for IPv6?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 19:19             ` Hannes Frederic Sowa
@ 2015-10-27 21:42               ` Hannes Frederic Sowa
  2015-10-27 22:03                 ` Tom Herbert
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-27 21:42 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

Hi Tom,

On Tue, Oct 27, 2015, at 20:19, Hannes Frederic Sowa wrote:
> On Tue, Oct 27, 2015, at 19:37, Tom Herbert wrote:
> > On Tue, Oct 27, 2015 at 11:29 AM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:
> > > On Tue, Oct 27, 2015, at 18:32, Tom Herbert wrote:
> > >> On Tue, Oct 27, 2015 at 9:44 AM, Hannes Frederic Sowa
> > >> <hannes@stressinduktion.org> wrote:
> > >> >
> > >> >
> > >> > On Tue, Oct 27, 2015, at 17:36, Tom Herbert wrote:> > -               if
> > >> > (cork->length + length > maxnonfragsize - headersize) {
> > >> >> > +       if (cork->length + length > maxnonfragsize - headersize) {
> > >> >> >  emsgsize:
> > >> >> > -                       ipv6_local_error(sk, EMSGSIZE, fl6,
> > >> >> > -                                        mtu - headersize +
> > >> >> > -                                        sizeof(struct ipv6hdr));
> > >> >> > -                       return -EMSGSIZE;
> > >> >> > -               }
> > >> >> > +               ipv6_local_error(sk, EMSGSIZE, fl6,
> > >> >> > +                                mtu - headersize +
> > >> >> > +                                sizeof(struct ipv6hdr));
> > >> >> > +               return -EMSGSIZE;
> > >> >> >         }
> > >> >> >
> > >> >> > +       /* CHECKSUM_PARTIAL only with no extension headers and when
> > >> >>
> > >> >> No, please don't do this. CHECKSUM_PARTIAL should work with extension
> > >> >> headers as defined, so this is just disabling otherwise valid and
> > >> >> useful functionality. If (some) drivers have problems with this they
> > >> >> need to be identified and fixed.
> > >> >
> > >> > I don't understand. The old code already didn't allow the use of
> > >> > opt_flen with CHECKSUM_PARTIAL.
> > >> >
> > >> Then that's a problem with the old code :-). Is there any other reason
> > >> that we can't use CHECKSUM_PARTIAL with extension headers other than
> > >> lack of correct driver support?
> > >
> > > The lack of correct driver support is a big bumper, but as I wrote, I
> > > don't see a reason to not lift this restriction in net-next. I proposed
> > > a new feature flag, or by looking at your series, we could probably use
> > > the extension header okay field for that.
> > >
> > Okay, but why bother doing this for net? This problem has obviously
> > existed for a while, and even if the restriction is maintained here
> > there are still other paths that don't go through ip_append_data that
> > could trip the bug. Also, drivers are welcome to fix their issues in
> > net I believe.
> 
> I even don't know if it could be a hardware issue. Also I don't want to
> break people's communication with a patch.
> IMHO without the WARN_ON_ONCEs, which I agreed to remove, I currently
> don't see any problem for net.
> 
> You don't agree on a netdev-feature flag, indicating the driver is okay
> with hardware checksumming and extension headers? We could add this to
> net-next pretty fast, I think. It does not require people to revert this
> patch in case their driver misbehaves and we don't get a fix for it,
> soon. Also what should we do if the driver simply does not support
> extension headers + checksum offloading? Completely kill checksum
> offloading for IPv6?

I posted v3 just now. I would like to let David consider it for net
inclusion. We can work on how to lift this limitation then in net-next,
okay? I am currently in favor of a new netdev-feature. What do you
think? Your RFC series could help here, too.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 21:42               ` Hannes Frederic Sowa
@ 2015-10-27 22:03                 ` Tom Herbert
  2015-10-28  0:12                   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 23+ messages in thread
From: Tom Herbert @ 2015-10-27 22:03 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 2:42 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Tom,
>
> On Tue, Oct 27, 2015, at 20:19, Hannes Frederic Sowa wrote:
>> On Tue, Oct 27, 2015, at 19:37, Tom Herbert wrote:
>> > On Tue, Oct 27, 2015 at 11:29 AM, Hannes Frederic Sowa
>> > <hannes@stressinduktion.org> wrote:
>> > > On Tue, Oct 27, 2015, at 18:32, Tom Herbert wrote:
>> > >> On Tue, Oct 27, 2015 at 9:44 AM, Hannes Frederic Sowa
>> > >> <hannes@stressinduktion.org> wrote:
>> > >> >
>> > >> >
>> > >> > On Tue, Oct 27, 2015, at 17:36, Tom Herbert wrote:> > -               if
>> > >> > (cork->length + length > maxnonfragsize - headersize) {
>> > >> >> > +       if (cork->length + length > maxnonfragsize - headersize) {
>> > >> >> >  emsgsize:
>> > >> >> > -                       ipv6_local_error(sk, EMSGSIZE, fl6,
>> > >> >> > -                                        mtu - headersize +
>> > >> >> > -                                        sizeof(struct ipv6hdr));
>> > >> >> > -                       return -EMSGSIZE;
>> > >> >> > -               }
>> > >> >> > +               ipv6_local_error(sk, EMSGSIZE, fl6,
>> > >> >> > +                                mtu - headersize +
>> > >> >> > +                                sizeof(struct ipv6hdr));
>> > >> >> > +               return -EMSGSIZE;
>> > >> >> >         }
>> > >> >> >
>> > >> >> > +       /* CHECKSUM_PARTIAL only with no extension headers and when
>> > >> >>
>> > >> >> No, please don't do this. CHECKSUM_PARTIAL should work with extension
>> > >> >> headers as defined, so this is just disabling otherwise valid and
>> > >> >> useful functionality. If (some) drivers have problems with this they
>> > >> >> need to be identified and fixed.
>> > >> >
>> > >> > I don't understand. The old code already didn't allow the use of
>> > >> > opt_flen with CHECKSUM_PARTIAL.
>> > >> >
>> > >> Then that's a problem with the old code :-). Is there any other reason
>> > >> that we can't use CHECKSUM_PARTIAL with extension headers other than
>> > >> lack of correct driver support?
>> > >
>> > > The lack of correct driver support is a big bumper, but as I wrote, I
>> > > don't see a reason to not lift this restriction in net-next. I proposed
>> > > a new feature flag, or by looking at your series, we could probably use
>> > > the extension header okay field for that.
>> > >
>> > Okay, but why bother doing this for net? This problem has obviously
>> > existed for a while, and even if the restriction is maintained here
>> > there are still other paths that don't go through ip_append_data that
>> > could trip the bug. Also, drivers are welcome to fix their issues in
>> > net I believe.
>>
>> I even don't know if it could be a hardware issue. Also I don't want to
>> break people's communication with a patch.
>> IMHO without the WARN_ON_ONCEs, which I agreed to remove, I currently
>> don't see any problem for net.
>>
>> You don't agree on a netdev-feature flag, indicating the driver is okay
>> with hardware checksumming and extension headers? We could add this to
>> net-next pretty fast, I think. It does not require people to revert this
>> patch in case their driver misbehaves and we don't get a fix for it,
>> soon. Also what should we do if the driver simply does not support
>> extension headers + checksum offloading? Completely kill checksum
>> offloading for IPv6?
>
> I posted v3 just now. I would like to let David consider it for net
> inclusion. We can work on how to lift this limitation then in net-next,
> okay? I am currently in favor of a new netdev-feature. What do you
> think? Your RFC series could help here, too.
>
I really do not like the feature flag, it's just a bandaid over the
real problem-- in fact my goal is to eliminate NETIF_F_IP{V6}_CSUM and
just have NETIF_F_HW_CSUM. I will repost the helper patches, but we
really do need to start fixing this stuff in the drivers instead of
more hacking in the stack.

Tom

> Thanks,
> Hannes
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-27 22:03                 ` Tom Herbert
@ 2015-10-28  0:12                   ` Hannes Frederic Sowa
  2015-10-28  0:31                     ` Tom Herbert
  0 siblings, 1 reply; 23+ messages in thread
From: Hannes Frederic Sowa @ 2015-10-28  0:12 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015, at 23:03, Tom Herbert wrote:
> On Tue, Oct 27, 2015 at 2:42 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > I posted v3 just now. I would like to let David consider it for net
> > inclusion. We can work on how to lift this limitation then in net-next,
> > okay? I am currently in favor of a new netdev-feature. What do you
> > think? Your RFC series could help here, too.
> >
> I really do not like the feature flag, it's just a bandaid over the
> real problem-- in fact my goal is to eliminate NETIF_F_IP{V6}_CSUM and
> just have NETIF_F_HW_CSUM. I will repost the helper patches, but we
> really do need to start fixing this stuff in the drivers instead of
> more hacking in the stack.

It would be great if this is doable but I doubt so. There might be a lot
of unresponsive driver maintainers and I don't see that we should simply
eliminate IPv4 csum offloading for those drivers, too. Sometimes it is
hard to patch drivers without documentation.

I am against lifting restrictions which will have unforeseeable 
consequences for some people (as in partial communication errors) or
having huge performance drawbacks (as in disabling ipv4 csum offloading,
too).

I could even imagine this needs to be more configurable as in how many
extension headers some hardware can process, I fear. One extension
header might be okay (jumping over a fragmentation header), but two... I
simply don't know, yet. Maybe there is no problem with hardware at all.

I don't really see this series as a hack. ;)

Unluckily it seems we don't get feedback from the hardware about not
being able to construct a proper checksum, so we cannot even close the
loop and add code which warns us about misbehaving drivers.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
  2015-10-28  0:12                   ` Hannes Frederic Sowa
@ 2015-10-28  0:31                     ` Tom Herbert
  0 siblings, 0 replies; 23+ messages in thread
From: Tom Herbert @ 2015-10-28  0:31 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Linux Kernel Network Developers, Eric Dumazet,
	Vladislav Yasevich, Benjamin Coddington

On Tue, Oct 27, 2015 at 5:12 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On Tue, Oct 27, 2015, at 23:03, Tom Herbert wrote:
>> On Tue, Oct 27, 2015 at 2:42 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>> > I posted v3 just now. I would like to let David consider it for net
>> > inclusion. We can work on how to lift this limitation then in net-next,
>> > okay? I am currently in favor of a new netdev-feature. What do you
>> > think? Your RFC series could help here, too.
>> >
>> I really do not like the feature flag, it's just a bandaid over the
>> real problem-- in fact my goal is to eliminate NETIF_F_IP{V6}_CSUM and
>> just have NETIF_F_HW_CSUM. I will repost the helper patches, but we
>> really do need to start fixing this stuff in the drivers instead of
>> more hacking in the stack.
>
> It would be great if this is doable but I doubt so. There might be a lot
> of unresponsive driver maintainers and I don't see that we should simply
> eliminate IPv4 csum offloading for those drivers, too. Sometimes it is
> hard to patch drivers without documentation.
>
> I am against lifting restrictions which will have unforeseeable
> consequences for some people (as in partial communication errors) or
> having huge performance drawbacks (as in disabling ipv4 csum offloading,
> too).
>
> I could even imagine this needs to be more configurable as in how many
> extension headers some hardware can process, I fear. One extension
> header might be okay (jumping over a fragmentation header), but two... I
> simply don't know, yet. Maybe there is no problem with hardware at all.
>
Hardware that implement NETIF_F_HW_CSUM (ie. calculate csum based on
start and offset) should have no problem with extension headers. The
plea in skbuff.h for HW vendors to implement that generic algorithm
has been around a long time, but unfortunately a lot of new HW is
still do protocol specific algorithms. Realistically, I can't deploy
extension headers at scale without checksum offload anyway, so this
just degenerates into another instance where poor HW design decisions
limit the protocols and features we're able to deploy in data center.
Oh well...

> I don't really see this series as a hack. ;)
>
> Unluckily it seems we don't get feedback from the hardware about not
> being able to construct a proper checksum, so we cannot even close the
> loop and add code which warns us about misbehaving drivers.
>
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-10-28  0:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-27 15:02 [PATCH net v2 0/4] net: clean up interactions of CHECKSUM_PARTIAL and fragmentation Hannes Frederic Sowa
2015-10-27 15:02 ` [PATCH net v2 1/4] ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
2015-10-27 16:04   ` Tom Herbert
2015-10-27 16:34     ` Hannes Frederic Sowa
2015-10-27 16:41       ` Tom Herbert
2015-10-27 15:02 ` [PATCH net v2 2/4] ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa
2015-10-27 16:06   ` Tom Herbert
2015-10-27 18:30     ` Hannes Frederic Sowa
2015-10-27 18:46   ` Tom Herbert
2015-10-27 19:01   ` Sergei Shtylyov
2015-10-27 19:15     ` Hannes Frederic Sowa
2015-10-27 15:02 ` [PATCH net v2 3/4] ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets Hannes Frederic Sowa
2015-10-27 16:36   ` Tom Herbert
2015-10-27 16:44     ` Hannes Frederic Sowa
2015-10-27 17:32       ` Tom Herbert
2015-10-27 18:29         ` Hannes Frederic Sowa
2015-10-27 18:37           ` Tom Herbert
2015-10-27 19:19             ` Hannes Frederic Sowa
2015-10-27 21:42               ` Hannes Frederic Sowa
2015-10-27 22:03                 ` Tom Herbert
2015-10-28  0:12                   ` Hannes Frederic Sowa
2015-10-28  0:31                     ` Tom Herbert
2015-10-27 15:02 ` [PATCH net v2 4/4] ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment Hannes Frederic Sowa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).