netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
@ 2013-09-21  4:27 Hannes Frederic Sowa
  2013-09-23  0:43 ` Hannes Frederic Sowa
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-09-21  4:27 UTC (permalink / raw)
  To: netdev; +Cc: yoshfuji, davem

In the following scenario the socket is corked:
If the first UDP packet is larger then the mtu we try to append it to the
write queue via ip6_ufo_append_data. A following packet, which is smaller
than the mtu would be appended to the already queued up gso-skb via
plain ip6_append_data. This causes random memory corruptions.

In ip6_ufo_append_data we also have to be careful to not queue up the
same skb multiple times. So setup the gso frame only when no first skb
is available.

This also fixes a shortcoming where we add the current packet's length to
cork->length but return early because of a packet > mtu with dontfrag set
(instead of sutracting it again).

Found with trinity.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---

I could only test this with virtualized UFO enabled network cards. Could
someone test this on real hardware?

 net/ipv6/ip6_output.c | 53 +++++++++++++++++++++------------------------------
 1 file changed, 22 insertions(+), 31 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3a692d5..a54c45c 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1015,6 +1015,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 	 * udp datagram
 	 */
 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
+		struct frag_hdr fhdr;
+
 		skb = sock_alloc_send_skb(sk,
 			hh_len + fragheaderlen + transhdrlen + 20,
 			(flags & MSG_DONTWAIT), &err);
@@ -1036,12 +1038,6 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->protocol = htons(ETH_P_IPV6);
 		skb->ip_summed = CHECKSUM_PARTIAL;
 		skb->csum = 0;
-	}
-
-	err = skb_append_datato_frags(sk,skb, getfrag, from,
-				      (length - transhdrlen));
-	if (!err) {
-		struct frag_hdr fhdr;
 
 		/* Specify the length of each IPv6 datagram fragment.
 		 * It has to be a multiple of 8.
@@ -1052,15 +1048,10 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		ipv6_select_ident(&fhdr, rt);
 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 		__skb_queue_tail(&sk->sk_write_queue, skb);
-
-		return 0;
 	}
-	/* There is not enough support do UPD LSO,
-	 * so follow normal path
-	 */
-	kfree_skb(skb);
 
-	return err;
+	return skb_append_datato_frags(sk, skb, getfrag, from,
+				       (length - transhdrlen));
 }
 
 static inline struct ipv6_opt_hdr *ip6_opt_dup(struct ipv6_opt_hdr *src,
@@ -1227,27 +1218,27 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	 * --yoshfuji
 	 */
 
-	cork->length += length;
-	if (length > mtu) {
-		int proto = sk->sk_protocol;
-		if (dontfrag && (proto == IPPROTO_UDP || proto == IPPROTO_RAW)){
-			ipv6_local_rxpmtu(sk, fl6, mtu-exthdrlen);
-			return -EMSGSIZE;
-		}
-
-		if (proto == IPPROTO_UDP &&
-		    (rt->dst.dev->features & NETIF_F_UFO)) {
+	if ((length > mtu) && dontfrag && (sk->sk_protocol == IPPROTO_UDP ||
+					   sk->sk_protocol == IPPROTO_RAW)) {
+		ipv6_local_rxpmtu(sk, fl6, mtu-exthdrlen);
+		return -EMSGSIZE;
+	}
 
-			err = ip6_ufo_append_data(sk, getfrag, from, length,
-						  hh_len, fragheaderlen,
-						  transhdrlen, mtu, flags, rt);
-			if (err)
-				goto error;
-			return 0;
-		}
+	skb = skb_peek_tail(&sk->sk_write_queue);
+	cork->length += length;
+	if (((length > mtu) ||
+	     (skb && skb_is_gso(skb))) &&
+	    (sk->sk_protocol == IPPROTO_UDP) &&
+	    (rt->dst.dev->features & NETIF_F_UFO)) {
+		err = ip6_ufo_append_data(sk, getfrag, from, length,
+					  hh_len, fragheaderlen,
+					  transhdrlen, mtu, flags, rt);
+		if (err)
+			goto error;
+		return 0;
 	}
 
-	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL)
+	if (!skb)
 		goto alloc_new_skb;
 
 	while (length > 0) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-09-21  4:27 [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO Hannes Frederic Sowa
@ 2013-09-23  0:43 ` Hannes Frederic Sowa
  2013-09-24 15:43 ` David Miller
  2013-09-30 11:43 ` Jiri Pirko
  2 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-09-23  0:43 UTC (permalink / raw)
  To: netdev, yoshfuji, davem, dvyukov; +Cc: pjt, andreyknvl, kcc, therbert

On Sat, Sep 21, 2013 at 06:27:00AM +0200, Hannes Frederic Sowa wrote:
> In the following scenario the socket is corked:
> If the first UDP packet is larger then the mtu we try to append it to the
> write queue via ip6_ufo_append_data. A following packet, which is smaller
> than the mtu would be appended to the already queued up gso-skb via
> plain ip6_append_data. This causes random memory corruptions.
> 
> In ip6_ufo_append_data we also have to be careful to not queue up the
> same skb multiple times. So setup the gso frame only when no first skb
> is available.
> 
> This also fixes a shortcoming where we add the current packet's length to
> cork->length but return early because of a packet > mtu with dontfrag set
> (instead of sutracting it again).
> 
> Found with trinity.
> 
> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Further analysis showed it is very probable that this fixes the bug Dmitry
reported. So I want to give proper credits:

Reported-by: Dmitry Vyukov <dvyukov@google.com>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-09-21  4:27 [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO Hannes Frederic Sowa
  2013-09-23  0:43 ` Hannes Frederic Sowa
@ 2013-09-24 15:43 ` David Miller
  2013-09-30 11:43 ` Jiri Pirko
  2 siblings, 0 replies; 31+ messages in thread
From: David Miller @ 2013-09-24 15:43 UTC (permalink / raw)
  To: hannes; +Cc: netdev, yoshfuji

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Sat, 21 Sep 2013 06:27:00 +0200

> In the following scenario the socket is corked:
> If the first UDP packet is larger then the mtu we try to append it to the
> write queue via ip6_ufo_append_data. A following packet, which is smaller
> than the mtu would be appended to the already queued up gso-skb via
> plain ip6_append_data. This causes random memory corruptions.
> 
> In ip6_ufo_append_data we also have to be careful to not queue up the
> same skb multiple times. So setup the gso frame only when no first skb
> is available.
> 
> This also fixes a shortcoming where we add the current packet's length to
> cork->length but return early because of a packet > mtu with dontfrag set
> (instead of sutracting it again).
> 
> Found with trinity.
> 
> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied and queued up for -stable, thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-09-21  4:27 [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO Hannes Frederic Sowa
  2013-09-23  0:43 ` Hannes Frederic Sowa
  2013-09-24 15:43 ` David Miller
@ 2013-09-30 11:43 ` Jiri Pirko
  2013-09-30 17:23   ` Hannes Frederic Sowa
  2 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2013-09-30 11:43 UTC (permalink / raw)
  To: hannes; +Cc: netdev, yoshfuji, davem

Sat, Sep 21, 2013 at 06:27:00AM CEST, hannes@stressinduktion.org wrote:
>In the following scenario the socket is corked:
>If the first UDP packet is larger then the mtu we try to append it to the
>write queue via ip6_ufo_append_data. A following packet, which is smaller
>than the mtu would be appended to the already queued up gso-skb via
>plain ip6_append_data. This causes random memory corruptions.
>
>In ip6_ufo_append_data we also have to be careful to not queue up the
>same skb multiple times. So setup the gso frame only when no first skb
>is available.
>
>This also fixes a shortcoming where we add the current packet's length to
>cork->length but return early because of a packet > mtu with dontfrag set
>(instead of sutracting it again).
>
>Found with trinity.
>
>Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
>Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
>---
>
>I could only test this with virtualized UFO enabled network cards. Could
>someone test this on real hardware?
>
> net/ipv6/ip6_output.c | 53 +++++++++++++++++++++------------------------------
> 1 file changed, 22 insertions(+), 31 deletions(-)
>
>diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>index 3a692d5..a54c45c 100644
>--- a/net/ipv6/ip6_output.c
>+++ b/net/ipv6/ip6_output.c
>@@ -1015,6 +1015,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> 	 * udp datagram
> 	 */
> 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
>+		struct frag_hdr fhdr;
>+
> 		skb = sock_alloc_send_skb(sk,
> 			hh_len + fragheaderlen + transhdrlen + 20,
> 			(flags & MSG_DONTWAIT), &err);
>@@ -1036,12 +1038,6 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> 		skb->protocol = htons(ETH_P_IPV6);
> 		skb->ip_summed = CHECKSUM_PARTIAL;
> 		skb->csum = 0;
>-	}
>-
>-	err = skb_append_datato_frags(sk,skb, getfrag, from,
>-				      (length - transhdrlen));
>-	if (!err) {
>-		struct frag_hdr fhdr;
> 
> 		/* Specify the length of each IPv6 datagram fragment.
> 		 * It has to be a multiple of 8.
>@@ -1052,15 +1048,10 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> 		ipv6_select_ident(&fhdr, rt);
> 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> 		__skb_queue_tail(&sk->sk_write_queue, skb);
>-
>-		return 0;
> 	}
>-	/* There is not enough support do UPD LSO,
>-	 * so follow normal path
>-	 */
>-	kfree_skb(skb);
> 
>-	return err;
>+	return skb_append_datato_frags(sk, skb, getfrag, from,
>+				       (length - transhdrlen));
> }
> 

What if non-ufo-path-created skb is peeked?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-09-30 11:43 ` Jiri Pirko
@ 2013-09-30 17:23   ` Hannes Frederic Sowa
  2013-10-01 10:58     ` Jiri Pirko
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-09-30 17:23 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, yoshfuji, davem

On Mon, Sep 30, 2013 at 01:43:43PM +0200, Jiri Pirko wrote:
> Sat, Sep 21, 2013 at 06:27:00AM CEST, hannes@stressinduktion.org wrote:
> >In the following scenario the socket is corked:
> >If the first UDP packet is larger then the mtu we try to append it to the
> >write queue via ip6_ufo_append_data. A following packet, which is smaller
> >than the mtu would be appended to the already queued up gso-skb via
> >plain ip6_append_data. This causes random memory corruptions.
> >
> >In ip6_ufo_append_data we also have to be careful to not queue up the
> >same skb multiple times. So setup the gso frame only when no first skb
> >is available.
> >
> >This also fixes a shortcoming where we add the current packet's length to
> >cork->length but return early because of a packet > mtu with dontfrag set
> >(instead of sutracting it again).
> >
> >Found with trinity.
> >
> >Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> >Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> >---
> >
> >I could only test this with virtualized UFO enabled network cards. Could
> >someone test this on real hardware?
> >
> > net/ipv6/ip6_output.c | 53 +++++++++++++++++++++------------------------------
> > 1 file changed, 22 insertions(+), 31 deletions(-)
> >
> >diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> >index 3a692d5..a54c45c 100644
> >--- a/net/ipv6/ip6_output.c
> >+++ b/net/ipv6/ip6_output.c
> >@@ -1015,6 +1015,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > 	 * udp datagram
> > 	 */
> > 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
> >+		struct frag_hdr fhdr;
> >+
> > 		skb = sock_alloc_send_skb(sk,
> > 			hh_len + fragheaderlen + transhdrlen + 20,
> > 			(flags & MSG_DONTWAIT), &err);
> >@@ -1036,12 +1038,6 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > 		skb->protocol = htons(ETH_P_IPV6);
> > 		skb->ip_summed = CHECKSUM_PARTIAL;
> > 		skb->csum = 0;
> >-	}
> >-
> >-	err = skb_append_datato_frags(sk,skb, getfrag, from,
> >-				      (length - transhdrlen));
> >-	if (!err) {
> >-		struct frag_hdr fhdr;
> > 
> > 		/* Specify the length of each IPv6 datagram fragment.
> > 		 * It has to be a multiple of 8.
> >@@ -1052,15 +1048,10 @@ static inline int ip6_ufo_append_data(struct sock *sk,
> > 		ipv6_select_ident(&fhdr, rt);
> > 		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
> > 		__skb_queue_tail(&sk->sk_write_queue, skb);
> >-
> >-		return 0;
> > 	}
> >-	/* There is not enough support do UPD LSO,
> >-	 * so follow normal path
> >-	 */
> >-	kfree_skb(skb);
> > 
> >-	return err;
> >+	return skb_append_datato_frags(sk, skb, getfrag, from,
> >+				       (length - transhdrlen));
> > }
> > 
> 
> What if non-ufo-path-created skb is peeked?

You mean like:
first append a frame < mtu
second frame > mtu so it gets handles by ip6_ufo_append?

Currently I don't see a problem with it but I may be wrong. What is
your suspicion?

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-09-30 17:23   ` Hannes Frederic Sowa
@ 2013-10-01 10:58     ` Jiri Pirko
  2013-10-01 12:09       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2013-10-01 10:58 UTC (permalink / raw)
  To: netdev, yoshfuji, davem; +Cc: kuznet, jmorris, kaber

>> 
>> What if non-ufo-path-created skb is peeked?
>
>You mean like:
>first append a frame < mtu
>second frame > mtu so it gets handles by ip6_ufo_append?
>
>Currently I don't see a problem with it but I may be wrong. What is
>your suspicion?

Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
Because of that it will be not threated as ufo skb.

Following patch fixes it:

Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well

Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv6/ip6_output.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..c6cfa2f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1008,6 +1008,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 
 {
 	struct sk_buff *skb;
+	struct frag_hdr fhdr;
 	int err;
 
 	/* There is support for UDP large send offload by network
@@ -1015,8 +1016,6 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 	 * udp datagram
 	 */
 	if ((skb = skb_peek_tail(&sk->sk_write_queue)) == NULL) {
-		struct frag_hdr fhdr;
-
 		skb = sock_alloc_send_skb(sk,
 			hh_len + fragheaderlen + transhdrlen + 20,
 			(flags & MSG_DONTWAIT), &err);
@@ -1036,20 +1035,23 @@ static inline int ip6_ufo_append_data(struct sock *sk,
 		skb->transport_header = skb->network_header + fragheaderlen;
 
 		skb->protocol = htons(ETH_P_IPV6);
-		skb->ip_summed = CHECKSUM_PARTIAL;
 		skb->csum = 0;
 
-		/* Specify the length of each IPv6 datagram fragment.
-		 * It has to be a multiple of 8.
-		 */
-		skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
-					     sizeof(struct frag_hdr)) & ~7;
-		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
-		ipv6_select_ident(&fhdr, rt);
-		skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 		__skb_queue_tail(&sk->sk_write_queue, skb);
-	}
+	} else if (skb_is_gso(skb))
+		goto append;
+
+	skb->ip_summed = CHECKSUM_PARTIAL;
+	/* Specify the length of each IPv6 datagram fragment.
+	 * It has to be a multiple of 8.
+	 */
+	skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
+				     sizeof(struct frag_hdr)) & ~7;
+	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+	ipv6_select_ident(&fhdr, rt);
+	skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
 
+append:
 	return skb_append_datato_frags(sk, skb, getfrag, from,
 				       (length - transhdrlen));
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-01 10:58     ` Jiri Pirko
@ 2013-10-01 12:09       ` Hannes Frederic Sowa
  2013-10-01 12:32         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-01 12:09 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, eric.dumazet

On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> >> 
> >> What if non-ufo-path-created skb is peeked?
> >
> >You mean like:
> >first append a frame < mtu
> >second frame > mtu so it gets handles by ip6_ufo_append?
> >
> >Currently I don't see a problem with it but I may be wrong. What is
> >your suspicion?
> 
> Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> Because of that it will be not threated as ufo skb.
> 
> Following patch fixes it:
> 
> Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> 
> Now, if user application does:
> sendto len<mtu flag MSG_MORE
> sendto len>mtu flag 0
> The skb is not treated as fragmented one because it is not initialized
> that way. So move the initialization to fix this.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

My understanding is that the frame is only a valid gso frame iff the first skb
queued up is setup as an UFO frame. In other cases we only need to make sure
we append to the fragment list without updating the gso field and the skb will
get linearized (if needed) later in the output path.

This seems not to matter for virtio_net and loopback because we seem to pass
the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
neterion and we have to verify if this assumption is correct, there.

This is also the way the IPv4 stack deals with UFO.

Either way, this needs a look from Eric Dumazet and Herbert Xu, so I added
them in Cc.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-01 12:09       ` Hannes Frederic Sowa
@ 2013-10-01 12:32         ` Hannes Frederic Sowa
  2013-10-01 21:47           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-01 12:32 UTC (permalink / raw)
  To: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert, eric.dumazet

On Tue, Oct 01, 2013 at 02:09:07PM +0200, Hannes Frederic Sowa wrote:
> On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> > >> 
> > >> What if non-ufo-path-created skb is peeked?
> > >
> > >You mean like:
> > >first append a frame < mtu
> > >second frame > mtu so it gets handles by ip6_ufo_append?
> > >
> > >Currently I don't see a problem with it but I may be wrong. What is
> > >your suspicion?
> > 
> > Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> > Because of that it will be not threated as ufo skb.
> > 
> > Following patch fixes it:
> > 
> > Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> > 
> > Now, if user application does:
> > sendto len<mtu flag MSG_MORE
> > sendto len>mtu flag 0
> > The skb is not treated as fragmented one because it is not initialized
> > that way. So move the initialization to fix this.
> > 
> > Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> 
> My understanding is that the frame is only a valid gso frame iff the first skb
> queued up is setup as an UFO frame. In other cases we only need to make sure
> we append to the fragment list without updating the gso field and the skb will
> get linearized (if needed) later in the output path.
> 
> This seems not to matter for virtio_net and loopback because we seem to pass
> the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
> neterion and we have to verify if this assumption is correct, there.

Hm, ip_append_page seems to violate my assumption. I need to read up on the
code later today.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-01 12:32         ` Hannes Frederic Sowa
@ 2013-10-01 21:47           ` Hannes Frederic Sowa
  2013-10-01 23:25             ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-01 21:47 UTC (permalink / raw)
  To: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert, eric.dumazet

Hi Jiri,

On Tue, Oct 01, 2013 at 02:32:14PM +0200, Hannes Frederic Sowa wrote:
> On Tue, Oct 01, 2013 at 02:09:07PM +0200, Hannes Frederic Sowa wrote:
> > On Tue, Oct 01, 2013 at 12:58:37PM +0200, Jiri Pirko wrote:
> > > >> 
> > > >> What if non-ufo-path-created skb is peeked?
> > > >
> > > >You mean like:
> > > >first append a frame < mtu
> > > >second frame > mtu so it gets handles by ip6_ufo_append?
> > > >
> > > >Currently I don't see a problem with it but I may be wrong. What is
> > > >your suspicion?
> > > 
> > > Well the skb will have gso_size==0 and ip_summed==CHECKSUM_NONE
> > > Because of that it will be not threated as ufo skb.
> > > 
> > > Following patch fixes it:
> > > 
> > > Subject: [patch net] ip6_output: do skb ufo init for peeked non ufo skb as well
> > > 
> > > Now, if user application does:
> > > sendto len<mtu flag MSG_MORE
> > > sendto len>mtu flag 0
> > > The skb is not treated as fragmented one because it is not initialized
> > > that way. So move the initialization to fix this.
> > > 
> > > Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> > 
> > My understanding is that the frame is only a valid gso frame iff the first skb
> > queued up is setup as an UFO frame. In other cases we only need to make sure
> > we append to the fragment list without updating the gso field and the skb will
> > get linearized (if needed) later in the output path.
> > 
> > This seems not to matter for virtio_net and loopback because we seem to pass
> > the skb as is. But the remaining ethernet card which sets NETIF_F_UFO is
> > neterion and we have to verify if this assumption is correct, there.
> 
> Hm, ip_append_page seems to violate my assumption. I need to read up on the
> code later today.

I still guess my assumption about gso is correct and I will have to review how
the ip_append_page codepath works in detail.

I currently have a tree where I have applied my UFO patch and your dontfrag
patch. When I have an application which does:

socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0	// UFO is enabled currently
setsockopt(3, SOL_UDP, 1, [1], 4)       = 0		// this sets UDP_CORK active
setsockopt(3, SOL_IPV6, IPV6_MTU, [1280], 4) = 0	// minimal IPV6 mtu
write(3, " ", 1)                        = 1		// generates non-gso head
write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 3701) = 3701	// generates gso segment in frags but skb_is_gso(skb) == false
write(3, " ", 1						// and boom: null pointer deref and slab corruption

Without your patch this would not been possible and I am not sure your
patch breaks assumptions for hardware UFO. We should wait with that
patch until we sorted out the details.

My current guess is that we need to replace the skb_is_gso() with
something that checks for already appended frags and continue to use
sbk_append_datato_frags in that case.

The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
get an oops.

IPv4 seems to work without problems, too.

Maybe I miss something? :|

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-01 21:47           ` Hannes Frederic Sowa
@ 2013-10-01 23:25             ` Hannes Frederic Sowa
  2013-10-02  8:58               ` Jiri Pirko
                                 ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-01 23:25 UTC (permalink / raw)
  To: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert, eric.dumazet

On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
> get an oops.

This is incorrect, it just depends on the size of the writes and on the
interface mtu.

> IPv4 seems to work without problems, too.

I also get kernel oopses from IPv4 now, too.


So, skb_is_gso is not accurate enough in the output and we have to check if we
already started to append to skb frags. The following diff does resolve this
issue in both the IPv4 and IPv6 non-page-append output path but I am not
confident if it is correct:

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6d56840..3565450 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1308,6 +1308,11 @@ static inline int skb_pagelen(const struct sk_buff *skb)
 	return len + skb_headlen(skb);
 }
 
+static inline bool skb_has_frags(const struct sk_buff *skb)
+{
+	return skb_shinfo(skb)->nr_frags;
+}
+
 /**
  * __skb_fill_page_desc - initialise a paged fragment in an skb
  * @skb: buffer containing fragment to be initialised
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 7d8357b..8dc3d8d 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -836,7 +836,7 @@ static int __ip_append_data(struct sock *sk,
 		csummode = CHECKSUM_PARTIAL;
 
 	cork->length += length;
-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
 	    (sk->sk_protocol == IPPROTO_UDP) &&
 	    (rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
 		err = ip_ufo_append_data(sk, queue, getfrag, from, length,
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a54c45c..ded4f6f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1227,7 +1227,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
 	skb = skb_peek_tail(&sk->sk_write_queue);
 	cork->length += length;
 	if (((length > mtu) ||
-	     (skb && skb_is_gso(skb))) &&
+	     (skb && skb_has_frags(skb))) &&
 	    (sk->sk_protocol == IPPROTO_UDP) &&
 	    (rt->dst.dev->features & NETIF_F_UFO)) {
 		err = ip6_ufo_append_data(sk, getfrag, from, length,

Greetings,

  Hannes

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-01 23:25             ` Hannes Frederic Sowa
@ 2013-10-02  8:58               ` Jiri Pirko
  2013-10-02 10:41                 ` Eric Dumazet
  2013-10-02 10:33               ` [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO Jiri Pirko
  2013-10-02 11:20               ` Jiri Pirko
  2 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2013-10-02  8:58 UTC (permalink / raw)
  To: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, eric.dumazet

Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
>On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
>> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
>> get an oops.
>
>This is incorrect, it just depends on the size of the writes and on the
>interface mtu.
>
>> IPv4 seems to work without problems, too.
>
>I also get kernel oopses from IPv4 now, too.
>
>
>So, skb_is_gso is not accurate enough in the output and we have to check if we
>already started to append to skb frags. The following diff does resolve this
>issue in both the IPv4 and IPv6 non-page-append output path but I am not
>confident if it is correct:
>
>diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>index 6d56840..3565450 100644
>--- a/include/linux/skbuff.h
>+++ b/include/linux/skbuff.h
>@@ -1308,6 +1308,11 @@ static inline int skb_pagelen(const struct sk_buff *skb)
> 	return len + skb_headlen(skb);
> }
> 
>+static inline bool skb_has_frags(const struct sk_buff *skb)
>+{
>+	return skb_shinfo(skb)->nr_frags;
>+}
>+
> /**
>  * __skb_fill_page_desc - initialise a paged fragment in an skb
>  * @skb: buffer containing fragment to be initialised
>diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>index 7d8357b..8dc3d8d 100644
>--- a/net/ipv4/ip_output.c
>+++ b/net/ipv4/ip_output.c
>@@ -836,7 +836,7 @@ static int __ip_append_data(struct sock *sk,
> 		csummode = CHECKSUM_PARTIAL;
> 
> 	cork->length += length;
>-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
>+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> 	    (sk->sk_protocol == IPPROTO_UDP) &&
> 	    (rt->dst.dev->features & NETIF_F_UFO) && !rt->dst.header_len) {
> 		err = ip_ufo_append_data(sk, queue, getfrag, from, length,
>diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>index a54c45c..ded4f6f 100644
>--- a/net/ipv6/ip6_output.c
>+++ b/net/ipv6/ip6_output.c
>@@ -1227,7 +1227,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,
> 	skb = skb_peek_tail(&sk->sk_write_queue);
> 	cork->length += length;
> 	if (((length > mtu) ||
>-	     (skb && skb_is_gso(skb))) &&
>+	     (skb && skb_has_frags(skb))) &&
> 	    (sk->sk_protocol == IPPROTO_UDP) &&
> 	    (rt->dst.dev->features & NETIF_F_UFO)) {
> 		err = ip6_ufo_append_data(sk, getfrag, from, length,
>
>Greetings,
>
>  Hannes
>


This seems correct to me. sk_is_gso would work as well is you apply my
patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
well" which does the setting of gso_size.

Jiri

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-01 23:25             ` Hannes Frederic Sowa
  2013-10-02  8:58               ` Jiri Pirko
@ 2013-10-02 10:33               ` Jiri Pirko
  2013-10-02 12:01                 ` Hannes Frederic Sowa
  2013-10-02 11:20               ` Jiri Pirko
  2 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2013-10-02 10:33 UTC (permalink / raw)
  To: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, eric.dumazet

Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
>On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
>> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
>> get an oops.
>
>This is incorrect, it just depends on the size of the writes and on the
>interface mtu.
>
>> IPv4 seems to work without problems, too.
>
>I also get kernel oopses from IPv4 now, too.

I'm not able to trigger this with ipv4. Can you please send strace
output for this as well?

Thanks

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-02  8:58               ` Jiri Pirko
@ 2013-10-02 10:41                 ` Eric Dumazet
  2013-10-02 12:12                   ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2013-10-02 10:41 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert

On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&

> 
> This seems correct to me. sk_is_gso would work as well is you apply my
> patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> well" which does the setting of gso_size.

Well, skb having frags or not should not be a concern :
Thats an allocation choice (lets say to avoid high order allocations). 

Setting gso_size is probably better.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-01 23:25             ` Hannes Frederic Sowa
  2013-10-02  8:58               ` Jiri Pirko
  2013-10-02 10:33               ` [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO Jiri Pirko
@ 2013-10-02 11:20               ` Jiri Pirko
  2013-10-02 11:53                 ` Eric Dumazet
  2 siblings, 1 reply; 31+ messages in thread
From: Jiri Pirko @ 2013-10-02 11:20 UTC (permalink / raw)
  To: netdev
  Cc: yoshfuji, davem, kuznet, jmorris, kaber, herbert, eric.dumazet, hannes

Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
>On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
>> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
>> get an oops.
>
>This is incorrect, it just depends on the size of the writes and on the
>interface mtu.
>
>> IPv4 seems to work without problems, too.
>
>I also get kernel oopses from IPv4 now, too.

This patch should fix this on ipv4 as well:

Subject: ip_output: do skb ufo init for peeked non ufo skb as well

Now, if user application does:
sendto len<mtu flag MSG_MORE
sendto len>mtu flag 0
The skb is not treated as fragmented one because it is not initialized
that way. So move the initialization to fix this.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
---
 net/ipv4/ip_output.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index a04d872..bd21c5d 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -772,15 +772,19 @@ static inline int ip_ufo_append_data(struct sock *sk,
 		/* initialize protocol header pointer */
 		skb->transport_header = skb->network_header + fragheaderlen;
 
-		skb->ip_summed = CHECKSUM_PARTIAL;
 		skb->csum = 0;
 
-		/* specify the length of each IP datagram fragment */
-		skb_shinfo(skb)->gso_size = maxfraglen - fragheaderlen;
-		skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
+
 		__skb_queue_tail(queue, skb);
-	}
+	} else if (skb_is_gso(skb))
+		goto append;
+
+	skb->ip_summed = CHECKSUM_PARTIAL;
+	/* specify the length of each IP datagram fragment */
+	skb_shinfo(skb)->gso_size = maxfraglen - fragheaderlen;
+	skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
 
+append:
 	return skb_append_datato_frags(sk, skb, getfrag, from,
 				       (length - transhdrlen));
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-02 11:20               ` Jiri Pirko
@ 2013-10-02 11:53                 ` Eric Dumazet
  2013-10-02 12:10                   ` Jiri Pirko
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2013-10-02 11:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, hannes


> This patch should fix this on ipv4 as well:
> 
> Subject: ip_output: do skb ufo init for peeked non ufo skb as well
> 

Is it an official patch ? s/Subject:/[PATCH]/ 

> Now, if user application does:

Any idea when the bug was added (commit id + title) ?

> sendto len<mtu flag MSG_MORE
> sendto len>mtu flag 0
> The skb is not treated as fragmented one because it is not initialized
> that way. So move the initialization to fix this.
> 
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  net/ipv4/ip_output.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index a04d872..bd21c5d 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -772,15 +772,19 @@ static inline int ip_ufo_append_data(struct sock *sk,
>  		/* initialize protocol header pointer */
>  		skb->transport_header = skb->network_header + fragheaderlen;
>  
> -		skb->ip_summed = CHECKSUM_PARTIAL;
>  		skb->csum = 0;

Any idea why we have skb->csum = 0 here ?

Thanks !

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-02 10:33               ` [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO Jiri Pirko
@ 2013-10-02 12:01                 ` Hannes Frederic Sowa
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-02 12:01 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, eric.dumazet

On Wed, Oct 02, 2013 at 12:33:33PM +0200, Jiri Pirko wrote:
> Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> >On Tue, Oct 01, 2013 at 11:47:21PM +0200, Hannes Frederic Sowa wrote:
> >> The strange thing is that if I don't do the IPV6_MTU setsockopt I don't
> >> get an oops.
> >
> >This is incorrect, it just depends on the size of the writes and on the
> >interface mtu.
> >
> >> IPv4 seems to work without problems, too.
> >
> >I also get kernel oopses from IPv4 now, too.
> 
> I'm not able to trigger this with ipv4. Can you please send strace
> output for this as well?

I used this snippet on loopback with UFO enabled and lo mtu 1280.

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/udp.h>
#include <stdio.h>

int test(int mtu)
{
        int fd;
        const int one = 1;
        const int off = 0;
        struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
        unsigned char buffer[3701];

        inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

        fd = socket(AF_INET, SOCK_DGRAM, 0);
        connect(fd, (struct sockaddr *) &addr, sizeof(addr));

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));

        write(fd, " ", 1);
        write(fd, buffer, sizeof(buffer));
        write(fd, " ", 1);

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));

        close(fd);
}

int main() {
        test(1280);
}

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-02 11:53                 ` Eric Dumazet
@ 2013-10-02 12:10                   ` Jiri Pirko
  0 siblings, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2013-10-02 12:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert, hannes

Wed, Oct 02, 2013 at 01:53:55PM CEST, eric.dumazet@gmail.com wrote:
>
>> This patch should fix this on ipv4 as well:
>> 
>> Subject: ip_output: do skb ufo init for peeked non ufo skb as well
>> 
>
>Is it an official patch ? s/Subject:/[PATCH]/ 

Yes. I thought that to state "Subject:" is enough for patchwork to parse
it. Apparently not :/

>
>> Now, if user application does:
>
>Any idea when the bug was added (commit id + title) ?

This is hard to say. This code is more or less broken from the very
beginning, which is:

commit e89e9cf539a28df7d0eb1d0a545368e9920b34ac "[IPv4/IPv6]: UFO Scatter-gather approach"

>
>> sendto len<mtu flag MSG_MORE
>> sendto len>mtu flag 0
>> The skb is not treated as fragmented one because it is not initialized
>> that way. So move the initialization to fix this.
>> 
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  net/ipv4/ip_output.c | 14 +++++++++-----
>>  1 file changed, 9 insertions(+), 5 deletions(-)
>> 
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index a04d872..bd21c5d 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -772,15 +772,19 @@ static inline int ip_ufo_append_data(struct sock *sk,
>>  		/* initialize protocol header pointer */
>>  		skb->transport_header = skb->network_header + fragheaderlen;
>>  
>> -		skb->ip_summed = CHECKSUM_PARTIAL;
>>  		skb->csum = 0;
>
>Any idea why we have skb->csum = 0 here ?

If I understand this correctly, that can be removed from here.


>
>Thanks !
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-02 10:41                 ` Eric Dumazet
@ 2013-10-02 12:12                   ` Hannes Frederic Sowa
  2013-10-02 13:03                     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-02 12:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert

Hi Eric!

On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> 
> > 
> > This seems correct to me. sk_is_gso would work as well is you apply my
> > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > well" which does the setting of gso_size.
> 
> Well, skb having frags or not should not be a concern :
> Thats an allocation choice (lets say to avoid high order allocations). 
> 
> Setting gso_size is probably better.

e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
approach") states:

"
skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
indicating that hardware has to do checksum calculation. Hardware should
compute the UDP checksum of complete datagram and also ip header checksum of
each fragmented IP packet.
"

This is the reason why I tried not to update the gso_size. If it is ok, I am
fine with that.

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-02 12:12                   ` Hannes Frederic Sowa
@ 2013-10-02 13:03                     ` Hannes Frederic Sowa
  2013-10-02 15:14                       ` Eric Dumazet
  0 siblings, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-02 13:03 UTC (permalink / raw)
  To: Eric Dumazet, Jiri Pirko, netdev, yoshfuji, davem, kuznet,
	jmorris, kaber, herbert

On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> Hi Eric!
> 
> On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > 
> > > 
> > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > well" which does the setting of gso_size.
> > 
> > Well, skb having frags or not should not be a concern :
> > Thats an allocation choice (lets say to avoid high order allocations). 
> > 
> > Setting gso_size is probably better.
> 
> e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> approach") states:
> 
> "
> skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> indicating that hardware has to do checksum calculation. Hardware should
> compute the UDP checksum of complete datagram and also ip header checksum of
> each fragmented IP packet.
> "
> 
> This is the reason why I tried not to update the gso_size. If it is ok, I am
> fine with that.

Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
mapping (skb->data with skb_headlen, which is fine) is used as the inband
header:

        if (offload_type == SKB_GSO_UDP)
                frg_cnt++; /* as Txd0 was used for inband header */

That is my only other hint that we maybe should not update gso_size and
gso_type. I guess software fallback does not have this problem, but I won't
have time to check until this evening.

I am really not sure if just setting gso_size does not break neterion UFO
offloading. :/

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO
  2013-10-02 13:03                     ` Hannes Frederic Sowa
@ 2013-10-02 15:14                       ` Eric Dumazet
  2013-10-02 16:27                         ` Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO] Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Eric Dumazet @ 2013-10-02 15:14 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber, herbert

On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > Hi Eric!
> > 
> > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > 
> > > > 
> > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > well" which does the setting of gso_size.
> > > 
> > > Well, skb having frags or not should not be a concern :
> > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > 
> > > Setting gso_size is probably better.
> > 
> > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > approach") states:
> > 
> > "
> > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > indicating that hardware has to do checksum calculation. Hardware should
> > compute the UDP checksum of complete datagram and also ip header checksum of
> > each fragmented IP packet.
> > "
> > 
> > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > fine with that.
> 
> Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> mapping (skb->data with skb_headlen, which is fine) is used as the inband
> header:
> 
>         if (offload_type == SKB_GSO_UDP)
>                 frg_cnt++; /* as Txd0 was used for inband header */
> 
> That is my only other hint that we maybe should not update gso_size and
> gso_type. I guess software fallback does not have this problem, but I won't
> have time to check until this evening.
> 
> I am really not sure if just setting gso_size does not break neterion UFO
> offloading. :/

Well, just ask Jon Mason to double check ;)

I think the commit intent was to set gso_size :

   skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
    fragment going out of the adapter after IP fragmentation by hardware.

The fact that it states "skb->data will contain MAC/IP/UDP header and
skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.

If Neterion driver mandates that skb->head *only* contains the
MAC/IP/UDP header, that should be handled in the driver itself.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-02 15:14                       ` Eric Dumazet
@ 2013-10-02 16:27                         ` Hannes Frederic Sowa
  2013-10-07 16:53                           ` Hannes Frederic Sowa
  2013-10-08  8:07                           ` Jon Mason
  0 siblings, 2 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-02 16:27 UTC (permalink / raw)
  To: jdmason
  Cc: Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris, kaber,
	herbert, eric.dumazet

Hi!

I have a question regarding UFO and the neterion driver, which as the only one
advertises hardware UFO support:

The patch discusses in this thread
http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
some semantics how packets are constructed before submitted to the driver.

We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
payload is attached in the skb's frags. With the changes discussed in this
thread it is possible that we also append to skb->data some amount of data
which is not targeted for the header. From reading the driver sources it seems
the hardware interprets the skb->data to skb_headlen as the header, so we
could include some data in the fragments more than once.

Do you think this change is safe? Otherwise I would suggest that the UFO
capability is switched off until the driver signals the hardware the start and
end of the headers correctly?

I left the mail below intact which points to the specific place in s2io.c
where I think the problem is.

On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > Hi Eric!
> > > 
> > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > 
> > > > > 
> > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > well" which does the setting of gso_size.
> > > > 
> > > > Well, skb having frags or not should not be a concern :
> > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > 
> > > > Setting gso_size is probably better.
> > > 
> > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > approach") states:
> > > 
> > > "
> > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > indicating that hardware has to do checksum calculation. Hardware should
> > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > each fragmented IP packet.
> > > "
> > > 
> > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > fine with that.
> > 
> > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > header:
> > 
> >         if (offload_type == SKB_GSO_UDP)
> >                 frg_cnt++; /* as Txd0 was used for inband header */
> > 
> > That is my only other hint that we maybe should not update gso_size and
> > gso_type. I guess software fallback does not have this problem, but I won't
> > have time to check until this evening.
> > 
> > I am really not sure if just setting gso_size does not break neterion UFO
> > offloading. :/
> 
> Well, just ask Jon Mason to double check ;)
> 
> I think the commit intent was to set gso_size :
> 
>    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
>     fragment going out of the adapter after IP fragmentation by hardware.
> 
> The fact that it states "skb->data will contain MAC/IP/UDP header and
> skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> 
> If Neterion driver mandates that skb->head *only* contains the
> MAC/IP/UDP header, that should be handled in the driver itself.

Thanks Eric for clearing this up.

I really thought it would be the common pattern for UFO to have only headers
in skb->data, so I didn't bother to ask in the first place.

Thanks,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-02 16:27                         ` Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO] Hannes Frederic Sowa
@ 2013-10-07 16:53                           ` Hannes Frederic Sowa
  2013-10-07 17:19                             ` Jon Mason
  2013-10-08  8:07                           ` Jon Mason
  1 sibling, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-07 16:53 UTC (permalink / raw)
  To: jon.mason
  Cc: jdmason, Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris,
	kaber, herbert, eric.dumazet

Hi Jon!

Maybe I got the wrong email address for the neterion driver from the
maintainers filer? If you are (still) affiliated with the neterion driver,
maybe you could have a short look at the quoted mail below?

Thanks,

  Hannes

On Wed, Oct 02, 2013 at 06:27:30PM +0200, Hannes Frederic Sowa wrote:
> Hi!
> 
> I have a question regarding UFO and the neterion driver, which as the only one
> advertises hardware UFO support:
> 
> The patch discusses in this thread
> http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> some semantics how packets are constructed before submitted to the driver.
> 
> We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> payload is attached in the skb's frags. With the changes discussed in this
> thread it is possible that we also append to skb->data some amount of data
> which is not targeted for the header. From reading the driver sources it seems
> the hardware interprets the skb->data to skb_headlen as the header, so we
> could include some data in the fragments more than once.
> 
> Do you think this change is safe? Otherwise I would suggest that the UFO
> capability is switched off until the driver signals the hardware the start and
> end of the headers correctly?
> 
> I left the mail below intact which points to the specific place in s2io.c
> where I think the problem is.
> 
> On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> > On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > > Hi Eric!
> > > > 
> > > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > > 
> > > > > > 
> > > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > > well" which does the setting of gso_size.
> > > > > 
> > > > > Well, skb having frags or not should not be a concern :
> > > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > > 
> > > > > Setting gso_size is probably better.
> > > > 
> > > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > > approach") states:
> > > > 
> > > > "
> > > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > > indicating that hardware has to do checksum calculation. Hardware should
> > > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > > each fragmented IP packet.
> > > > "
> > > > 
> > > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > > fine with that.
> > > 
> > > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > > header:
> > > 
> > >         if (offload_type == SKB_GSO_UDP)
> > >                 frg_cnt++; /* as Txd0 was used for inband header */
> > > 
> > > That is my only other hint that we maybe should not update gso_size and
> > > gso_type. I guess software fallback does not have this problem, but I won't
> > > have time to check until this evening.
> > > 
> > > I am really not sure if just setting gso_size does not break neterion UFO
> > > offloading. :/
> > 
> > Well, just ask Jon Mason to double check ;)
> > 
> > I think the commit intent was to set gso_size :
> > 
> >    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
> >     fragment going out of the adapter after IP fragmentation by hardware.
> > 
> > The fact that it states "skb->data will contain MAC/IP/UDP header and
> > skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> > 
> > If Neterion driver mandates that skb->head *only* contains the
> > MAC/IP/UDP header, that should be handled in the driver itself.
> 
> Thanks Eric for clearing this up.
> 
> I really thought it would be the common pattern for UFO to have only headers
> in skb->data, so I didn't bother to ask in the first place.
> 
> Thanks,
> 
>   Hannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-07 16:53                           ` Hannes Frederic Sowa
@ 2013-10-07 17:19                             ` Jon Mason
  2013-10-07 17:27                               ` Hannes Frederic Sowa
  0 siblings, 1 reply; 31+ messages in thread
From: Jon Mason @ 2013-10-07 17:19 UTC (permalink / raw)
  To: jdmason, Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris,
	kaber, herbert, eric.dumazet

On Mon, Oct 07, 2013 at 06:53:43PM +0200, Hannes Frederic Sowa wrote:
> Hi Jon!
> 
> Maybe I got the wrong email address for the neterion driver from the
> maintainers filer? If you are (still) affiliated with the neterion driver,
> maybe you could have a short look at the quoted mail below?

Both are valid email addresses, but I prefer to address non-Intel
issues with my kudzu.us email account.

I apologize for not addressing your question yet.  What you are saying
makes sense, but I want to dig through the documentation and verify.
However, I haven't had the time.  I'll brew up a pot of coffee when I
get home and I'll get an answer to you before I go to bed tonight :)

Thanks,
Jon

> 
> Thanks,
> 
>   Hannes
> 
> On Wed, Oct 02, 2013 at 06:27:30PM +0200, Hannes Frederic Sowa wrote:
> > Hi!
> > 
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> > 
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> > 
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> > 
> > Do you think this change is safe? Otherwise I would suggest that the UFO
> > capability is switched off until the driver signals the hardware the start and
> > end of the headers correctly?
> > 
> > I left the mail below intact which points to the specific place in s2io.c
> > where I think the problem is.
> > 
> > On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
> > > On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
> > > > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
> > > > > Hi Eric!
> > > > > 
> > > > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
> > > > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
> > > > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
> > > > > > > >-	if (((length > mtu) || (skb && skb_is_gso(skb))) &&
> > > > > > > >+	if (((length > mtu) || (skb && skb_has_frags(skb))) &&
> > > > > > 
> > > > > > > 
> > > > > > > This seems correct to me. sk_is_gso would work as well is you apply my
> > > > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
> > > > > > > well" which does the setting of gso_size.
> > > > > > 
> > > > > > Well, skb having frags or not should not be a concern :
> > > > > > Thats an allocation choice (lets say to avoid high order allocations). 
> > > > > > 
> > > > > > Setting gso_size is probably better.
> > > > > 
> > > > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
> > > > > approach") states:
> > > > > 
> > > > > "
> > > > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
> > > > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
> > > > > indicating that hardware has to do checksum calculation. Hardware should
> > > > > compute the UDP checksum of complete datagram and also ip header checksum of
> > > > > each fragmented IP packet.
> > > > > "
> > > > > 
> > > > > This is the reason why I tried not to update the gso_size. If it is ok, I am
> > > > > fine with that.
> > > > 
> > > > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
> > > > mapping (skb->data with skb_headlen, which is fine) is used as the inband
> > > > header:
> > > > 
> > > >         if (offload_type == SKB_GSO_UDP)
> > > >                 frg_cnt++; /* as Txd0 was used for inband header */
> > > > 
> > > > That is my only other hint that we maybe should not update gso_size and
> > > > gso_type. I guess software fallback does not have this problem, but I won't
> > > > have time to check until this evening.
> > > > 
> > > > I am really not sure if just setting gso_size does not break neterion UFO
> > > > offloading. :/
> > > 
> > > Well, just ask Jon Mason to double check ;)
> > > 
> > > I think the commit intent was to set gso_size :
> > > 
> > >    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
> > >     fragment going out of the adapter after IP fragmentation by hardware.
> > > 
> > > The fact that it states "skb->data will contain MAC/IP/UDP header and
> > > skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
> > > 
> > > If Neterion driver mandates that skb->head *only* contains the
> > > MAC/IP/UDP header, that should be handled in the driver itself.
> > 
> > Thanks Eric for clearing this up.
> > 
> > I really thought it would be the common pattern for UFO to have only headers
> > in skb->data, so I didn't bother to ask in the first place.
> > 
> > Thanks,
> > 
> >   Hannes
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-07 17:19                             ` Jon Mason
@ 2013-10-07 17:27                               ` Hannes Frederic Sowa
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-07 17:27 UTC (permalink / raw)
  To: Jon Mason
  Cc: jdmason, Jiri Pirko, netdev, yoshfuji, davem, kuznet, jmorris,
	kaber, herbert, eric.dumazet

On Mon, Oct 07, 2013 at 10:19:41AM -0700, Jon Mason wrote:
> Both are valid email addresses, but I prefer to address non-Intel
> issues with my kudzu.us email account.

Ok.

> I apologize for not addressing your question yet.  What you are saying
> makes sense, but I want to dig through the documentation and verify.
> However, I haven't had the time.  I'll brew up a pot of coffee when I
> get home and I'll get an answer to you before I go to bed tonight :)

No need to hurry, I do not want to induce stress. ;)

I just wanted to make sure this does not get forgotten so we can apply Jiri's
patches anytime soon.

Enjoy your coffee,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-02 16:27                         ` Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO] Hannes Frederic Sowa
  2013-10-07 16:53                           ` Hannes Frederic Sowa
@ 2013-10-08  8:07                           ` Jon Mason
  2013-10-08 13:00                             ` Eric Dumazet
  2013-10-08 14:53                             ` Hannes Frederic Sowa
  1 sibling, 2 replies; 31+ messages in thread
From: Jon Mason @ 2013-10-08  8:07 UTC (permalink / raw)
  To: Jon Mason, Jiri Pirko, netdev, yoshfuji, David Miller, kuznet,
	jmorris, kaber, herbert, Eric Dumazet

On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi!
>
> I have a question regarding UFO and the neterion driver, which as the only one
> advertises hardware UFO support:
>
> The patch discusses in this thread
> http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> some semantics how packets are constructed before submitted to the driver.
>
> We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> payload is attached in the skb's frags. With the changes discussed in this
> thread it is possible that we also append to skb->data some amount of data
> which is not targeted for the header. From reading the driver sources it seems
> the hardware interprets the skb->data to skb_headlen as the header, so we
> could include some data in the fragments more than once.

>From my reading of the HW Spec and a quick look at the driver, it
appears that the driver is using one entry in the TX ring for the
header and another for the body of the packet to be fragmented (which
is what the hardware wants).  I don't understand what you are saying,
but if you are asking if simply appending a new header & data to the
end of skb->data will get it out on the wire correct, I don't believe
it will.

I do have hardware that I can try the patch on, if you can walk me
through the use case (unless it is as easy as setup an IPv6 connection
and ping).

Time for sleep now....

Thanks,
Jon

>
> Do you think this change is safe? Otherwise I would suggest that the UFO
> capability is switched off until the driver signals the hardware the start and
> end of the headers correctly?
>
> I left the mail below intact which points to the specific place in s2io.c
> where I think the problem is.
>
> On Wed, Oct 02, 2013 at 08:14:27AM -0700, Eric Dumazet wrote:
>> On Wed, 2013-10-02 at 15:03 +0200, Hannes Frederic Sowa wrote:
>> > On Wed, Oct 02, 2013 at 02:12:07PM +0200, Hannes Frederic Sowa wrote:
>> > > Hi Eric!
>> > >
>> > > On Wed, Oct 02, 2013 at 03:41:28AM -0700, Eric Dumazet wrote:
>> > > > On Wed, 2013-10-02 at 10:58 +0200, Jiri Pirko wrote:
>> > > > > Wed, Oct 02, 2013 at 01:25:34AM CEST, hannes@stressinduktion.org wrote:
>> > > > > >-    if (((length > mtu) || (skb && skb_is_gso(skb))) &&
>> > > > > >+    if (((length > mtu) || (skb && skb_has_frags(skb))) &&
>> > > >
>> > > > >
>> > > > > This seems correct to me. sk_is_gso would work as well is you apply my
>> > > > > patch "[patch net] ip6_output: do skb ufo init for peeked non ufo skb as
>> > > > > well" which does the setting of gso_size.
>> > > >
>> > > > Well, skb having frags or not should not be a concern :
>> > > > Thats an allocation choice (lets say to avoid high order allocations).
>> > > >
>> > > > Setting gso_size is probably better.
>> > >
>> > > e89e9cf539a28df7d0eb1d0a545368e9920b34ac ("[IPv4/IPv6]: UFO Scatter-gather
>> > > approach") states:
>> > >
>> > > "
>> > > skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
>> > > contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
>> > > indicating that hardware has to do checksum calculation. Hardware should
>> > > compute the UDP checksum of complete datagram and also ip header checksum of
>> > > each fragmented IP packet.
>> > > "
>> > >
>> > > This is the reason why I tried not to update the gso_size. If it is ok, I am
>> > > fine with that.
>> >
>> > Especially, drivers/net/ethernet/neterion/s2io.c states that the first dma
>> > mapping (skb->data with skb_headlen, which is fine) is used as the inband
>> > header:
>> >
>> >         if (offload_type == SKB_GSO_UDP)
>> >                 frg_cnt++; /* as Txd0 was used for inband header */
>> >
>> > That is my only other hint that we maybe should not update gso_size and
>> > gso_type. I guess software fallback does not have this problem, but I won't
>> > have time to check until this evening.
>> >
>> > I am really not sure if just setting gso_size does not break neterion UFO
>> > offloading. :/
>>
>> Well, just ask Jon Mason to double check ;)
>>
>> I think the commit intent was to set gso_size :
>>
>>    skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
>>     fragment going out of the adapter after IP fragmentation by hardware.
>>
>> The fact that it states "skb->data will contain MAC/IP/UDP header and
>> skb_shinfo(skb)->frags[] contains the data payload." seems irrelevant.
>>
>> If Neterion driver mandates that skb->head *only* contains the
>> MAC/IP/UDP header, that should be handled in the driver itself.
>
> Thanks Eric for clearing this up.
>
> I really thought it would be the common pattern for UFO to have only headers
> in skb->data, so I didn't bother to ask in the first place.
>
> Thanks,
>
>   Hannes
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-08  8:07                           ` Jon Mason
@ 2013-10-08 13:00                             ` Eric Dumazet
  2013-10-08 14:53                             ` Hannes Frederic Sowa
  1 sibling, 0 replies; 31+ messages in thread
From: Eric Dumazet @ 2013-10-08 13:00 UTC (permalink / raw)
  To: Jon Mason
  Cc: Jiri Pirko, netdev, yoshfuji, David Miller, kuznet, jmorris,
	kaber, herbert

On Tue, 2013-10-08 at 01:07 -0700, Jon Mason wrote:
> On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi!
> >
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> >
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> >
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> 
> From my reading of the HW Spec and a quick look at the driver, it
> appears that the driver is using one entry in the TX ring for the
> header and another for the body of the packet to be fragmented (which
> is what the hardware wants).  I don't understand what you are saying,
> but if you are asking if simply appending a new header & data to the
> end of skb->data will get it out on the wire correct, I don't believe
> it will.
> 
> I do have hardware that I can try the patch on, if you can walk me
> through the use case (unless it is as easy as setup an IPv6 connection
> and ping).

I think this behavior is quite common. Driver should certainly already
do the right thing, as TCP frames can have the same property.

bnx2x for example splits skb->head if it contains payload after headers.

drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c

                if (unlikely(skb_headlen(skb) > hlen)) {
                        nbd++;
                        bd_prod = bnx2x_tx_split(bp, txdata, tx_buf,
                                                 &tx_start_bd, hlen,
                                                 bd_prod);
                }

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-08  8:07                           ` Jon Mason
  2013-10-08 13:00                             ` Eric Dumazet
@ 2013-10-08 14:53                             ` Hannes Frederic Sowa
  2013-10-17  4:45                               ` Hannes Frederic Sowa
  1 sibling, 1 reply; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-08 14:53 UTC (permalink / raw)
  To: Jon Mason
  Cc: Jiri Pirko, netdev, yoshfuji, David Miller, kuznet, jmorris,
	kaber, herbert, Eric Dumazet

On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
> On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi!
> >
> > I have a question regarding UFO and the neterion driver, which as the only one
> > advertises hardware UFO support:
> >
> > The patch discusses in this thread
> > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > some semantics how packets are constructed before submitted to the driver.
> >
> > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > payload is attached in the skb's frags. With the changes discussed in this
> > thread it is possible that we also append to skb->data some amount of data
> > which is not targeted for the header. From reading the driver sources it seems
> > the hardware interprets the skb->data to skb_headlen as the header, so we
> > could include some data in the fragments more than once.
> 
> From my reading of the HW Spec and a quick look at the driver, it
> appears that the driver is using one entry in the TX ring for the
> header and another for the body of the packet to be fragmented (which
> is what the hardware wants).  I don't understand what you are saying,
> but if you are asking if simply appending a new header & data to the
> end of skb->data will get it out on the wire correct, I don't believe
> it will.

No this is not what I tried to say. I'll try to be more clear this
time. ;)

We start with an UDP socket which is corked. As soon as we write the
first few bytes (smaller than the mtu) onto this socket we put the
header in place and the rest of the data is just appended behind the
header directly in skb->data via plain ip_append_data.

Now a second write with a length > mtu happens: The ip(6)_append_data
will branch to ufo_append. This will fetch the first skb and append
to skb->frags.  gso_type and gso_size will be updated on this skb (this
currently does not happen but will with the patches discussed in this
thread).

If this packet is transmitted down to the device driver we have the udp
header in skb->data *and* also the payload from the first write. The
payload from the second write is appended as a frag and gso_type and
gso_size are set. This header+payload seem to be mapped just after the
ufo_in_band_v descriptor as the header in the first tx descriptor:

   4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
   4175                                               frg_len, PCI_DMA_TODEVICE);

frg_len is set to skb_headlen(skb). This happens right after setting up
the descriptor for the in-band ufo data.

My guess is that this data isn't split currently by the neterion driver
(at least I could not find it in the driver as Eric showed it for bnx2x)
so it might reappear in the packets when the hardware fragments the
packet and places the first tx ring in front of every packet.

Before these changes we never updated the gso_type and gso_size even when
we did append via UFO. So we never had payload in an UFO marked skb->data,
only the headers. Now we could also end up with a some payload in the
first TX ring, which you said is only for the header.

> I do have hardware that I can try the patch on, if you can walk me
> through the use case (unless it is as easy as setup an IPv6 connection
> and ping).

Ok, testing this should not be that complicated:

We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/

--- >8 ---
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <linux/udp.h>
#include <stdio.h>

int test(int mtu)
{
        int fd;
        const int one = 1;
        const int off = 0;
        struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
        unsigned char buffer[3701];

        inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);

        fd = socket(AF_INET, SOCK_DGRAM, 0);
        connect(fd, (struct sockaddr *) &addr, sizeof(addr));

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));

        write(fd, "    ", 4);
        write(fd, buffer, sizeof(buffer));
        write(fd, " ", 1);

        setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));

        close(fd);
}

int main() {
        test(1280);
}
--- >8 ---

I left out error handling so it is better observed with strace if
something went wrong.

You should change the port number and ip address to something reasonable
for your network. My guess would be that the spaces (0x20) of the first
write is now placed between UDP header and payload of every packet
fragmented by the hardware. Would be nice to hear that I am wrong. ;)

Be aware that the above program can cause memory corruption in the kernel
if you did not apply Jiri's patch.

Thanks for helping!

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-08 14:53                             ` Hannes Frederic Sowa
@ 2013-10-17  4:45                               ` Hannes Frederic Sowa
  2013-10-18  7:52                                 ` Jiri Pirko
  2013-10-23 16:35                                 ` Jon Mason
  0 siblings, 2 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-17  4:45 UTC (permalink / raw)
  To: Jon Mason, Jiri Pirko, netdev, yoshfuji, David Miller, kuznet,
	jmorris, kaber, herbert, Eric Dumazet

Hi Jon and Jiri!

Just wanted to remind you if you could have a look at this?

If you don't have time to test this may I know your assessment of the
situation? I could send a compile-time tested patch to disable UFO or if you
say so we could leave this as is.

Jiri, I would suggest you resend your patches then.

Thanks,

  Hannes

[top-posted by intention]

On Tue, Oct 08, 2013 at 04:53:31PM +0200, Hannes Frederic Sowa wrote:
> On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
> > On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:
> > > Hi!
> > >
> > > I have a question regarding UFO and the neterion driver, which as the only one
> > > advertises hardware UFO support:
> > >
> > > The patch discusses in this thread
> > > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
> > > some semantics how packets are constructed before submitted to the driver.
> > >
> > > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
> > > payload is attached in the skb's frags. With the changes discussed in this
> > > thread it is possible that we also append to skb->data some amount of data
> > > which is not targeted for the header. From reading the driver sources it seems
> > > the hardware interprets the skb->data to skb_headlen as the header, so we
> > > could include some data in the fragments more than once.
> > 
> > From my reading of the HW Spec and a quick look at the driver, it
> > appears that the driver is using one entry in the TX ring for the
> > header and another for the body of the packet to be fragmented (which
> > is what the hardware wants).  I don't understand what you are saying,
> > but if you are asking if simply appending a new header & data to the
> > end of skb->data will get it out on the wire correct, I don't believe
> > it will.
> 
> No this is not what I tried to say. I'll try to be more clear this
> time. ;)
> 
> We start with an UDP socket which is corked. As soon as we write the
> first few bytes (smaller than the mtu) onto this socket we put the
> header in place and the rest of the data is just appended behind the
> header directly in skb->data via plain ip_append_data.
> 
> Now a second write with a length > mtu happens: The ip(6)_append_data
> will branch to ufo_append. This will fetch the first skb and append
> to skb->frags.  gso_type and gso_size will be updated on this skb (this
> currently does not happen but will with the patches discussed in this
> thread).
> 
> If this packet is transmitted down to the device driver we have the udp
> header in skb->data *and* also the payload from the first write. The
> payload from the second write is appended as a frag and gso_type and
> gso_size are set. This header+payload seem to be mapped just after the
> ufo_in_band_v descriptor as the header in the first tx descriptor:
> 
>    4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
>    4175                                               frg_len, PCI_DMA_TODEVICE);
> 
> frg_len is set to skb_headlen(skb). This happens right after setting up
> the descriptor for the in-band ufo data.
> 
> My guess is that this data isn't split currently by the neterion driver
> (at least I could not find it in the driver as Eric showed it for bnx2x)
> so it might reappear in the packets when the hardware fragments the
> packet and places the first tx ring in front of every packet.
> 
> Before these changes we never updated the gso_type and gso_size even when
> we did append via UFO. So we never had payload in an UFO marked skb->data,
> only the headers. Now we could also end up with a some payload in the
> first TX ring, which you said is only for the header.
> 
> > I do have hardware that I can try the patch on, if you can walk me
> > through the use case (unless it is as easy as setup an IPv6 connection
> > and ping).
> 
> Ok, testing this should not be that complicated:
> 
> We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
> with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/
> 
> --- >8 ---
> #include <sys/types.h>
> #include <sys/socket.h>
> #include <netinet/in.h>
> #include <arpa/inet.h>
> #include <linux/udp.h>
> #include <stdio.h>
> 
> int test(int mtu)
> {
>         int fd;
>         const int one = 1;
>         const int off = 0;
>         struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
>         unsigned char buffer[3701];
> 
>         inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
> 
>         fd = socket(AF_INET, SOCK_DGRAM, 0);
>         connect(fd, (struct sockaddr *) &addr, sizeof(addr));
> 
>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));
> 
>         write(fd, "    ", 4);
>         write(fd, buffer, sizeof(buffer));
>         write(fd, " ", 1);
> 
>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));
> 
>         close(fd);
> }
> 
> int main() {
>         test(1280);
> }
> --- >8 ---
> 
> I left out error handling so it is better observed with strace if
> something went wrong.
> 
> You should change the port number and ip address to something reasonable
> for your network. My guess would be that the spaces (0x20) of the first
> write is now placed between UDP header and payload of every packet
> fragmented by the hardware. Would be nice to hear that I am wrong. ;)
> 
> Be aware that the above program can cause memory corruption in the kernel
> if you did not apply Jiri's patch.
> 
> Thanks for helping!
> 
>   Hannes
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
gruss,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-17  4:45                               ` Hannes Frederic Sowa
@ 2013-10-18  7:52                                 ` Jiri Pirko
  2013-10-23 16:35                                 ` Jon Mason
  1 sibling, 0 replies; 31+ messages in thread
From: Jiri Pirko @ 2013-10-18  7:52 UTC (permalink / raw)
  To: Jon Mason, netdev, yoshfuji, David Miller, kuznet, jmorris,
	kaber, herbert, Eric Dumazet

Thu, Oct 17, 2013 at 06:45:52AM CEST, hannes@stressinduktion.org wrote:
>Hi Jon and Jiri!
>
>Just wanted to remind you if you could have a look at this?
>
>If you don't have time to test this may I know your assessment of the
>situation? I could send a compile-time tested patch to disable UFO or if you
>say so we could leave this as is.
>
>Jiri, I would suggest you resend your patches then.


Okay. I will.

>
>Thanks,
>
>  Hannes
>
>[top-posted by intention]
>
>On Tue, Oct 08, 2013 at 04:53:31PM +0200, Hannes Frederic Sowa wrote:
>> On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
>> > On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
>> > <hannes@stressinduktion.org> wrote:
>> > > Hi!
>> > >
>> > > I have a question regarding UFO and the neterion driver, which as the only one
>> > > advertises hardware UFO support:
>> > >
>> > > The patch discusses in this thread
>> > > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
>> > > some semantics how packets are constructed before submitted to the driver.
>> > >
>> > > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
>> > > payload is attached in the skb's frags. With the changes discussed in this
>> > > thread it is possible that we also append to skb->data some amount of data
>> > > which is not targeted for the header. From reading the driver sources it seems
>> > > the hardware interprets the skb->data to skb_headlen as the header, so we
>> > > could include some data in the fragments more than once.
>> > 
>> > From my reading of the HW Spec and a quick look at the driver, it
>> > appears that the driver is using one entry in the TX ring for the
>> > header and another for the body of the packet to be fragmented (which
>> > is what the hardware wants).  I don't understand what you are saying,
>> > but if you are asking if simply appending a new header & data to the
>> > end of skb->data will get it out on the wire correct, I don't believe
>> > it will.
>> 
>> No this is not what I tried to say. I'll try to be more clear this
>> time. ;)
>> 
>> We start with an UDP socket which is corked. As soon as we write the
>> first few bytes (smaller than the mtu) onto this socket we put the
>> header in place and the rest of the data is just appended behind the
>> header directly in skb->data via plain ip_append_data.
>> 
>> Now a second write with a length > mtu happens: The ip(6)_append_data
>> will branch to ufo_append. This will fetch the first skb and append
>> to skb->frags.  gso_type and gso_size will be updated on this skb (this
>> currently does not happen but will with the patches discussed in this
>> thread).
>> 
>> If this packet is transmitted down to the device driver we have the udp
>> header in skb->data *and* also the payload from the first write. The
>> payload from the second write is appended as a frag and gso_type and
>> gso_size are set. This header+payload seem to be mapped just after the
>> ufo_in_band_v descriptor as the header in the first tx descriptor:
>> 
>>    4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
>>    4175                                               frg_len, PCI_DMA_TODEVICE);
>> 
>> frg_len is set to skb_headlen(skb). This happens right after setting up
>> the descriptor for the in-band ufo data.
>> 
>> My guess is that this data isn't split currently by the neterion driver
>> (at least I could not find it in the driver as Eric showed it for bnx2x)
>> so it might reappear in the packets when the hardware fragments the
>> packet and places the first tx ring in front of every packet.
>> 
>> Before these changes we never updated the gso_type and gso_size even when
>> we did append via UFO. So we never had payload in an UFO marked skb->data,
>> only the headers. Now we could also end up with a some payload in the
>> first TX ring, which you said is only for the header.
>> 
>> > I do have hardware that I can try the patch on, if you can walk me
>> > through the use case (unless it is as easy as setup an IPv6 connection
>> > and ping).
>> 
>> Ok, testing this should not be that complicated:
>> 
>> We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
>> with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/
>> 
>> --- >8 ---
>> #include <sys/types.h>
>> #include <sys/socket.h>
>> #include <netinet/in.h>
>> #include <arpa/inet.h>
>> #include <linux/udp.h>
>> #include <stdio.h>
>> 
>> int test(int mtu)
>> {
>>         int fd;
>>         const int one = 1;
>>         const int off = 0;
>>         struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
>>         unsigned char buffer[3701];
>> 
>>         inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
>> 
>>         fd = socket(AF_INET, SOCK_DGRAM, 0);
>>         connect(fd, (struct sockaddr *) &addr, sizeof(addr));
>> 
>>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));
>> 
>>         write(fd, "    ", 4);
>>         write(fd, buffer, sizeof(buffer));
>>         write(fd, " ", 1);
>> 
>>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));
>> 
>>         close(fd);
>> }
>> 
>> int main() {
>>         test(1280);
>> }
>> --- >8 ---
>> 
>> I left out error handling so it is better observed with strace if
>> something went wrong.
>> 
>> You should change the port number and ip address to something reasonable
>> for your network. My guess would be that the spaces (0x20) of the first
>> write is now placed between UDP header and payload of every packet
>> fragmented by the hardware. Would be nice to hear that I am wrong. ;)
>> 
>> Be aware that the above program can cause memory corruption in the kernel
>> if you did not apply Jiri's patch.
>> 
>> Thanks for helping!
>> 
>>   Hannes
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>
>-- 
>gruss,
>
>  Hannes
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-17  4:45                               ` Hannes Frederic Sowa
  2013-10-18  7:52                                 ` Jiri Pirko
@ 2013-10-23 16:35                                 ` Jon Mason
  2013-10-23 18:15                                   ` Hannes Frederic Sowa
  1 sibling, 1 reply; 31+ messages in thread
From: Jon Mason @ 2013-10-23 16:35 UTC (permalink / raw)
  To: Jon Mason, Jiri Pirko, netdev, yoshfuji, David Miller,
	Alexey Kuznetsov, jmorris, Patrick McHardy, Herbert Xu,
	Eric Dumazet

On Wed, Oct 16, 2013 at 9:45 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hi Jon and Jiri!
>
> Just wanted to remind you if you could have a look at this?
>
> If you don't have time to test this may I know your assessment of the
> situation? I could send a compile-time tested patch to disable UFO or if you
> say so we could leave this as is.

So, bad news.  My Xframe 2 adapter (the only variety that does UFO
offload) won't fit in a standard PCI(32) slot.  Since my PCI-X system
at home is faulty (I'm trying to fix it , but it won't be in the time
frame you want), there is no way for me to test it on the hardware.
Terribly sorry.

I am fine with this patch going out, since UFO is off by default.
I'll handle any issues once they are discovered.  Alternatively, we
could just kill UFO and make everyone's lives easier.

Thanks,
Jon

> Jiri, I would suggest you resend your patches then.
>
> Thanks,
>
>   Hannes
>
> [top-posted by intention]
>
> On Tue, Oct 08, 2013 at 04:53:31PM +0200, Hannes Frederic Sowa wrote:
>> On Tue, Oct 08, 2013 at 01:07:29AM -0700, Jon Mason wrote:
>> > On Wed, Oct 2, 2013 at 9:27 AM, Hannes Frederic Sowa
>> > <hannes@stressinduktion.org> wrote:
>> > > Hi!
>> > >
>> > > I have a question regarding UFO and the neterion driver, which as the only one
>> > > advertises hardware UFO support:
>> > >
>> > > The patch discusses in this thread
>> > > http://thread.gmane.org/gmane.linux.network/284348/focus=285405 could change
>> > > some semantics how packets are constructed before submitted to the driver.
>> > >
>> > > We currently guarantee that we have the MAC/IP/UDP header in skb->data and the
>> > > payload is attached in the skb's frags. With the changes discussed in this
>> > > thread it is possible that we also append to skb->data some amount of data
>> > > which is not targeted for the header. From reading the driver sources it seems
>> > > the hardware interprets the skb->data to skb_headlen as the header, so we
>> > > could include some data in the fragments more than once.
>> >
>> > From my reading of the HW Spec and a quick look at the driver, it
>> > appears that the driver is using one entry in the TX ring for the
>> > header and another for the body of the packet to be fragmented (which
>> > is what the hardware wants).  I don't understand what you are saying,
>> > but if you are asking if simply appending a new header & data to the
>> > end of skb->data will get it out on the wire correct, I don't believe
>> > it will.
>>
>> No this is not what I tried to say. I'll try to be more clear this
>> time. ;)
>>
>> We start with an UDP socket which is corked. As soon as we write the
>> first few bytes (smaller than the mtu) onto this socket we put the
>> header in place and the rest of the data is just appended behind the
>> header directly in skb->data via plain ip_append_data.
>>
>> Now a second write with a length > mtu happens: The ip(6)_append_data
>> will branch to ufo_append. This will fetch the first skb and append
>> to skb->frags.  gso_type and gso_size will be updated on this skb (this
>> currently does not happen but will with the patches discussed in this
>> thread).
>>
>> If this packet is transmitted down to the device driver we have the udp
>> header in skb->data *and* also the payload from the first write. The
>> payload from the second write is appended as a frag and gso_type and
>> gso_size are set. This header+payload seem to be mapped just after the
>> ufo_in_band_v descriptor as the header in the first tx descriptor:
>>
>>    4174         txdp->Buffer_Pointer = pci_map_single(sp->pdev, skb->data,
>>    4175                                               frg_len, PCI_DMA_TODEVICE);
>>
>> frg_len is set to skb_headlen(skb). This happens right after setting up
>> the descriptor for the in-band ufo data.
>>
>> My guess is that this data isn't split currently by the neterion driver
>> (at least I could not find it in the driver as Eric showed it for bnx2x)
>> so it might reappear in the packets when the hardware fragments the
>> packet and places the first tx ring in front of every packet.
>>
>> Before these changes we never updated the gso_type and gso_size even when
>> we did append via UFO. So we never had payload in an UFO marked skb->data,
>> only the headers. Now we could also end up with a some payload in the
>> first TX ring, which you said is only for the header.
>>
>> > I do have hardware that I can try the patch on, if you can walk me
>> > through the use case (unless it is as easy as setup an IPv6 connection
>> > and ping).
>>
>> Ok, testing this should not be that complicated:
>>
>> We can test this with plain IPv4/UDP sockets. I would suggest a net-next kernel
>> with this patch from Jiri applied: http://patchwork.ozlabs.org/patch/279691/
>>
>> --- >8 ---
>> #include <sys/types.h>
>> #include <sys/socket.h>
>> #include <netinet/in.h>
>> #include <arpa/inet.h>
>> #include <linux/udp.h>
>> #include <stdio.h>
>>
>> int test(int mtu)
>> {
>>         int fd;
>>         const int one = 1;
>>         const int off = 0;
>>         struct sockaddr_in addr = {.sin_family = AF_INET, .sin_port = htons(53) };
>>         unsigned char buffer[3701];
>>
>>         inet_pton(AF_INET, "127.0.0.1", &addr.sin_addr);
>>
>>         fd = socket(AF_INET, SOCK_DGRAM, 0);
>>         connect(fd, (struct sockaddr *) &addr, sizeof(addr));
>>
>>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &one, sizeof(one));
>>
>>         write(fd, "    ", 4);
>>         write(fd, buffer, sizeof(buffer));
>>         write(fd, " ", 1);
>>
>>         setsockopt(fd, IPPROTO_UDP, UDP_CORK, &off, sizeof(off));
>>
>>         close(fd);
>> }
>>
>> int main() {
>>         test(1280);
>> }
>> --- >8 ---
>>
>> I left out error handling so it is better observed with strace if
>> something went wrong.
>>
>> You should change the port number and ip address to something reasonable
>> for your network. My guess would be that the spaces (0x20) of the first
>> write is now placed between UDP header and payload of every packet
>> fragmented by the hardware. Would be nice to hear that I am wrong. ;)
>>
>> Be aware that the above program can cause memory corruption in the kernel
>> if you did not apply Jiri's patch.
>>
>> Thanks for helping!
>>
>>   Hannes
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
> --
> gruss,
>
>   Hannes
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO]
  2013-10-23 16:35                                 ` Jon Mason
@ 2013-10-23 18:15                                   ` Hannes Frederic Sowa
  0 siblings, 0 replies; 31+ messages in thread
From: Hannes Frederic Sowa @ 2013-10-23 18:15 UTC (permalink / raw)
  To: Jon Mason
  Cc: Jiri Pirko, netdev, yoshfuji, David Miller, Alexey Kuznetsov,
	jmorris, Patrick McHardy, Herbert Xu, Eric Dumazet

On Wed, Oct 23, 2013 at 09:35:43AM -0700, Jon Mason wrote:
> On Wed, Oct 16, 2013 at 9:45 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
> > Hi Jon and Jiri!
> >
> > Just wanted to remind you if you could have a look at this?
> >
> > If you don't have time to test this may I know your assessment of the
> > situation? I could send a compile-time tested patch to disable UFO or if you
> > say so we could leave this as is.
> 
> So, bad news.  My Xframe 2 adapter (the only variety that does UFO
> offload) won't fit in a standard PCI(32) slot.  Since my PCI-X system
> at home is faulty (I'm trying to fix it , but it won't be in the time
> frame you want), there is no way for me to test it on the hardware.
> Terribly sorry.
> 
> I am fine with this patch going out, since UFO is off by default.
> I'll handle any issues once they are discovered.  Alternatively, we
> could just kill UFO and make everyone's lives easier.

Oh, I missed that UFO is off by default.

If it turns out that UFO is causing broken frames it should be either
killed or (if you have the time for that) fixed. Because there shouldn't
be regressions in stable kernels I am fine with this resolution. Maybe
you can have a look at this problem once your hardware is fixed.

Thank you,

  Hannes

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2013-10-23 18:15 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-21  4:27 [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO Hannes Frederic Sowa
2013-09-23  0:43 ` Hannes Frederic Sowa
2013-09-24 15:43 ` David Miller
2013-09-30 11:43 ` Jiri Pirko
2013-09-30 17:23   ` Hannes Frederic Sowa
2013-10-01 10:58     ` Jiri Pirko
2013-10-01 12:09       ` Hannes Frederic Sowa
2013-10-01 12:32         ` Hannes Frederic Sowa
2013-10-01 21:47           ` Hannes Frederic Sowa
2013-10-01 23:25             ` Hannes Frederic Sowa
2013-10-02  8:58               ` Jiri Pirko
2013-10-02 10:41                 ` Eric Dumazet
2013-10-02 12:12                   ` Hannes Frederic Sowa
2013-10-02 13:03                     ` Hannes Frederic Sowa
2013-10-02 15:14                       ` Eric Dumazet
2013-10-02 16:27                         ` Neterion and UFO handling [was: Re: [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO] Hannes Frederic Sowa
2013-10-07 16:53                           ` Hannes Frederic Sowa
2013-10-07 17:19                             ` Jon Mason
2013-10-07 17:27                               ` Hannes Frederic Sowa
2013-10-08  8:07                           ` Jon Mason
2013-10-08 13:00                             ` Eric Dumazet
2013-10-08 14:53                             ` Hannes Frederic Sowa
2013-10-17  4:45                               ` Hannes Frederic Sowa
2013-10-18  7:52                                 ` Jiri Pirko
2013-10-23 16:35                                 ` Jon Mason
2013-10-23 18:15                                   ` Hannes Frederic Sowa
2013-10-02 10:33               ` [PATCH] ipv6: udp packets following an UFO enqueued packet need also be handled by UFO Jiri Pirko
2013-10-02 12:01                 ` Hannes Frederic Sowa
2013-10-02 11:20               ` Jiri Pirko
2013-10-02 11:53                 ` Eric Dumazet
2013-10-02 12:10                   ` Jiri Pirko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).