All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tcp: make urg+gso work for real this time
@ 2008-12-03 11:22 Ilpo Järvinen
  2008-12-03 14:37 ` Petr Tesarik
  2008-12-17  1:18 ` Herbert Xu
  0 siblings, 2 replies; 47+ messages in thread
From: Ilpo Järvinen @ 2008-12-03 11:22 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev, Petr Tesarik

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4184 bytes --]


I should have noticed this earlier... :-) The previous solution
to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
patterns, or even worse, a steep downward slope into packet
counting because each skb pcount would be truncated to pcount
of 2 and then the following fragments of the later portion would
restore the window again.

Basically this reverts "tcp: Do not use TSO/GSO when there is
urgent data" (33cf71cee1). It also removes some unnecessary code
from tcp_current_mss that didn't work as intented either (could
be that something was changed down the road, or it might have
been broken since the dawn of time) because it only works once
urg is already written while this bug shows up starting from
~64k before the urg point.

The retransmissions already are split to mss sized chunks, so
only new data sending paths need splitting in case they have
a segment otherwise suitable for gso/tso. The actually check
can be improved to be more narrow but since this is late -rc
already, I'll postpone thinking the more fine-grained things.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Petr Tesarik <ptesarik@suse.cz>
---
 net/ipv4/tcp_output.c |   22 ++++++++++------------
 1 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 76f8409..0744b9f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -722,8 +722,7 @@ static void tcp_queue_skb(struct sock *sk, struct sk_buff *skb)
 static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb,
 				 unsigned int mss_now)
 {
-	if (skb->len <= mss_now || !sk_can_gso(sk) ||
-	    tcp_urg_mode(tcp_sk(sk))) {
+	if (skb->len <= mss_now || !sk_can_gso(sk)) {
 		/* Avoid the costly divide in the normal
 		 * non-TSO case.
 		 */
@@ -1029,10 +1028,6 @@ unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu)
 
 /* Compute the current effective MSS, taking SACKs and IP options,
  * and even PMTU discovery events into account.
- *
- * LARGESEND note: !tcp_urg_mode is overkill, only frames up to snd_up
- * cannot be large. However, taking into account rare use of URG, this
- * is not a big flaw.
  */
 unsigned int tcp_current_mss(struct sock *sk, int large_allowed)
 {
@@ -1047,7 +1042,7 @@ unsigned int tcp_current_mss(struct sock *sk, int large_allowed)
 
 	mss_now = tp->mss_cache;
 
-	if (large_allowed && sk_can_gso(sk) && !tcp_urg_mode(tp))
+	if (large_allowed && sk_can_gso(sk))
 		doing_tso = 1;
 
 	if (dst) {
@@ -1164,9 +1159,7 @@ static int tcp_init_tso_segs(struct sock *sk, struct sk_buff *skb,
 {
 	int tso_segs = tcp_skb_pcount(skb);
 
-	if (!tso_segs ||
-	    (tso_segs > 1 && (tcp_skb_mss(skb) != mss_now ||
-			      tcp_urg_mode(tcp_sk(sk))))) {
+	if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
 		tcp_set_skb_tso_segs(sk, skb, mss_now);
 		tso_segs = tcp_skb_pcount(skb);
 	}
@@ -1519,6 +1512,10 @@ static int tcp_mtu_probe(struct sock *sk)
  * send_head.  This happens as incoming acks open up the remote
  * window for us.
  *
+ * LARGESEND note: !tcp_urg_mode is overkill, only frames between
+ * snd_up-64k-mss .. snd_up cannot be large. However, taking into
+ * account rare use of URG, this is not a big flaw.
+ *
  * Returns 1, if no segments are in flight and we have queued segments, but
  * cannot send anything now because of SWS or another problem.
  */
@@ -1570,7 +1567,7 @@ static int tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle)
 		}
 
 		limit = mss_now;
-		if (tso_segs > 1)
+		if (tso_segs > 1 && !tcp_urg_mode(tp))
 			limit = tcp_mss_split_point(sk, skb, mss_now,
 						    cwnd_quota);
 
@@ -1619,6 +1616,7 @@ void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
  */
 void tcp_push_one(struct sock *sk, unsigned int mss_now)
 {
+	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb = tcp_send_head(sk);
 	unsigned int tso_segs, cwnd_quota;
 
@@ -1633,7 +1631,7 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)
 		BUG_ON(!tso_segs);
 
 		limit = mss_now;
-		if (tso_segs > 1)
+		if (tso_segs > 1 && !tcp_urg_mode(tp))
 			limit = tcp_mss_split_point(sk, skb, mss_now,
 						    cwnd_quota);
 
-- 
1.5.2.2

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-03 11:22 [PATCH] tcp: make urg+gso work for real this time Ilpo Järvinen
@ 2008-12-03 14:37 ` Petr Tesarik
  2008-12-03 18:31   ` Ilpo Järvinen
  2008-12-17  1:18 ` Herbert Xu
  1 sibling, 1 reply; 47+ messages in thread
From: Petr Tesarik @ 2008-12-03 14:37 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev

Ilpo Järvinen píše v St 03. 12. 2008 v 13:22 +0200:
> I should have noticed this earlier... :-) The previous solution
> to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
> patterns, or even worse, a steep downward slope into packet
> counting because each skb pcount would be truncated to pcount
> of 2 and then the following fragments of the later portion would
> restore the window again.

Oops, not nice. :(

> Basically this reverts "tcp: Do not use TSO/GSO when there is
> urgent data" (33cf71cee1). It also removes some unnecessary code
> from tcp_current_mss that didn't work as intented either (could
> be that something was changed down the road, or it might have
> been broken since the dawn of time) because it only works once
> urg is already written while this bug shows up starting from
> ~64k before the urg point.
> 
> The retransmissions already are split to mss sized chunks, so
> only new data sending paths need splitting in case they have
> a segment otherwise suitable for gso/tso. The actually check
> can be improved to be more narrow but since this is late -rc
> already, I'll postpone thinking the more fine-grained things.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Cc: Petr Tesarik <ptesarik@suse.cz>

Yes, re-fragmenting the packet here is probably the best way to go. But
repeating the same condition on multiple places is not so nice. What
about this little improvement?

(I didn't roll the revert of 33cf71cee1 into it, as it makes the actual
change less obvious. I think there are better ways of doing a revert in
git.)

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

--
 tcp_output.c |   13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ba85d88..a45c62d 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1028,10 +1028,6 @@ unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu)
 
 /* Compute the current effective MSS, taking SACKs and IP options,
  * and even PMTU discovery events into account.
- *
- * LARGESEND note: !tcp_urg_mode is overkill, only frames up to snd_up
- * cannot be large. However, taking into account rare use of URG, this
- * is not a big flaw.
  */
 unsigned int tcp_current_mss(struct sock *sk, int large_allowed)
 {
@@ -1046,7 +1042,7 @@ unsigned int tcp_current_mss(struct sock *sk, int large_allowed)
 
 	mss_now = tp->mss_cache;
 
-	if (large_allowed && sk_can_gso(sk) && !tcp_urg_mode(tp))
+	if (large_allowed && sk_can_gso(sk))
 		doing_tso = 1;
 
 	if (dst) {
@@ -1113,6 +1109,10 @@ static void tcp_cwnd_validate(struct sock *sk)
  * return whenever allowed by the other factors). Basically we need the
  * modulo only when the receiver window alone is the limiting factor or
  * when we would be allowed to send the split-due-to-Nagle skb fully.
+ *
+ * LARGESEND note: !tcp_urg_mode is overkill, only frames between
+ * snd_up-64k-mss .. snd_up cannot be large. However, taking into
+ * account rare use of URG, this is not a big flaw.
  */
 static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb,
 					unsigned int mss_now, unsigned int cwnd)
@@ -1120,6 +1120,9 @@ static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb,
 	struct tcp_sock *tp = tcp_sk(sk);
 	u32 needed, window, cwnd_len;
 
+	if (unlikely(tcp_urg_mode(tp)))
+		return mss_now;
+
 	window = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;
 	cwnd_len = mss_now * cwnd;
 



^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-03 14:37 ` Petr Tesarik
@ 2008-12-03 18:31   ` Ilpo Järvinen
  2008-12-04  5:25     ` David Miller
  0 siblings, 1 reply; 47+ messages in thread
From: Ilpo Järvinen @ 2008-12-03 18:31 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: David Miller, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4247 bytes --]

On Wed, 3 Dec 2008, Petr Tesarik wrote:

> Ilpo Järvinen píse v St 03. 12. 2008 v 13:22 +0200:
> > I should have noticed this earlier... :-) The previous solution
> > to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
> > patterns, or even worse, a steep downward slope into packet
> > counting because each skb pcount would be truncated to pcount
> > of 2 and then the following fragments of the later portion would
> > restore the window again.
> 
> Oops, not nice. :(
> 
> > Basically this reverts "tcp: Do not use TSO/GSO when there is
> > urgent data" (33cf71cee1). It also removes some unnecessary code
> > from tcp_current_mss that didn't work as intented either (could
> > be that something was changed down the road, or it might have
> > been broken since the dawn of time) because it only works once
> > urg is already written while this bug shows up starting from
> > ~64k before the urg point.
> > 
> > The retransmissions already are split to mss sized chunks, so
> > only new data sending paths need splitting in case they have
> > a segment otherwise suitable for gso/tso. The actually check
> > can be improved to be more narrow but since this is late -rc
> > already, I'll postpone thinking the more fine-grained things.
> > 
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> > Cc: Petr Tesarik <ptesarik@suse.cz>
> 
> Yes, re-fragmenting the packet here is probably the best way to go. But
> repeating the same condition on multiple places is not so nice. What
> about this little improvement?

Yeah, this too will work... Though not of too much significance since
I've already most of unification of those two functions in a local
queue awaiting testing, polish up and submit to net-next.

When sending an updated version davem prefers to have full changelog in it 
btw, it saves some work for him.

> (I didn't roll the revert of 33cf71cee1 into it, as it makes the actual
> change less obvious. I think there are better ways of doing a revert in
> git.)

Whatever, I've no strong opinion either way though it's just broken with 
and without the revert anyway, the full thing is needed to actually fix 
it.

> Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
>
> --
>  tcp_output.c |   13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ba85d88..a45c62d 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1028,10 +1028,6 @@ unsigned int tcp_sync_mss(struct sock *sk, u32 pmtu)
>  
>  /* Compute the current effective MSS, taking SACKs and IP options,
>   * and even PMTU discovery events into account.
> - *
> - * LARGESEND note: !tcp_urg_mode is overkill, only frames up to snd_up
> - * cannot be large. However, taking into account rare use of URG, this
> - * is not a big flaw.
>   */
>  unsigned int tcp_current_mss(struct sock *sk, int large_allowed)
>  {
> @@ -1046,7 +1042,7 @@ unsigned int tcp_current_mss(struct sock *sk, int large_allowed)
>  
>  	mss_now = tp->mss_cache;
>  
> -	if (large_allowed && sk_can_gso(sk) && !tcp_urg_mode(tp))
> +	if (large_allowed && sk_can_gso(sk))
>  		doing_tso = 1;
>  
>  	if (dst) {
> @@ -1113,6 +1109,10 @@ static void tcp_cwnd_validate(struct sock *sk)
>   * return whenever allowed by the other factors). Basically we need the
>   * modulo only when the receiver window alone is the limiting factor or
>   * when we would be allowed to send the split-due-to-Nagle skb fully.
> + *
> + * LARGESEND note: !tcp_urg_mode is overkill, only frames between
> + * snd_up-64k-mss .. snd_up cannot be large. However, taking into
> + * account rare use of URG, this is not a big flaw.
>   */
>  static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb,
>  					unsigned int mss_now, unsigned int cwnd)
> @@ -1120,6 +1120,9 @@ static unsigned int tcp_mss_split_point(struct sock *sk, struct sk_buff *skb,
>  	struct tcp_sock *tp = tcp_sk(sk);
>  	u32 needed, window, cwnd_len;
>  
> +	if (unlikely(tcp_urg_mode(tp)))
> +		return mss_now;
> +
>  	window = tcp_wnd_end(tp) - TCP_SKB_CB(skb)->seq;
>  	cwnd_len = mss_now * cwnd;

Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

-- 
 i.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-03 18:31   ` Ilpo Järvinen
@ 2008-12-04  5:25     ` David Miller
  0 siblings, 0 replies; 47+ messages in thread
From: David Miller @ 2008-12-04  5:25 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: ptesarik, netdev

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 3 Dec 2008 20:31:09 +0200 (EET)

> On Wed, 3 Dec 2008, Petr Tesarik wrote:
> 
> > Ilpo Järvinen píse v St 03. 12. 2008 v 13:22 +0200:
> > Yes, re-fragmenting the packet here is probably the best way to go. But
> > repeating the same condition on multiple places is not so nice. What
> > about this little improvement?
> 
> Yeah, this too will work... Though not of too much significance since
> I've already most of unification of those two functions in a local
> queue awaiting testing, polish up and submit to net-next.
> 
> When sending an updated version davem prefers to have full changelog in it 
> btw, it saves some work for him.
> 
> > (I didn't roll the revert of 33cf71cee1 into it, as it makes the actual
> > change less obvious. I think there are better ways of doing a revert in
> > git.)
> 
> Whatever, I've no strong opinion either way though it's just broken with 
> and without the revert anyway, the full thing is needed to actually fix 
> it.

I've applied Ilpo's version of the fix patch, thanks everyone.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-03 11:22 [PATCH] tcp: make urg+gso work for real this time Ilpo Järvinen
  2008-12-03 14:37 ` Petr Tesarik
@ 2008-12-17  1:18 ` Herbert Xu
  2008-12-17  8:15   ` Ilpo Järvinen
  2008-12-17  8:18   ` Petr Tesarik
  1 sibling, 2 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17  1:18 UTC (permalink / raw)
  To: Ilpo J??rvinen; +Cc: davem, netdev, ptesarik

Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote:
> 
> I should have noticed this earlier... :-) The previous solution
> to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
> patterns, or even worse, a steep downward slope into packet
> counting because each skb pcount would be truncated to pcount
> of 2 and then the following fragments of the later portion would
> restore the window again.

How about just handling it, URG isn't that hard.  If anyone is
bored you can do this for GRO as well.

tcp: Add GSO support for urgent data

When segmenting packets with urgent data we need to clear the
urgent flag/pointer once we move past the end of the urgent data.
This was not handled correctly in GSO or (presumably) in existing
hardware.

This patch adds a new GSO flag for TCP urgent data and also fixes
its handling in GSO.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e26f549..4508c48 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -532,9 +532,11 @@ struct net_device
 #define NETIF_F_GSO_ROBUST	(SKB_GSO_DODGY << NETIF_F_GSO_SHIFT)
 #define NETIF_F_TSO_ECN		(SKB_GSO_TCP_ECN << NETIF_F_GSO_SHIFT)
 #define NETIF_F_TSO6		(SKB_GSO_TCPV6 << NETIF_F_GSO_SHIFT)
+#define NETIF_F_TSO_URG		(SKB_GSO_TCP_URG << NETIF_F_GSO_SHIFT)
 
 	/* List of features with software fallbacks. */
-#define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
+#define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
+				 NETIF_F_TSO6 | NETIF_F_TSO_URG)
 
 
 #define NETIF_F_GEN_CSUM	(NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2725f4e..56e9179 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -188,6 +188,9 @@ enum {
 	SKB_GSO_TCP_ECN = 1 << 3,
 
 	SKB_GSO_TCPV6 = 1 << 4,
+
+	/* This indicates the tcp segment has URG set. */
+	SKB_GSO_TCP_URG = 1 << 5,
 };
 
 #if BITS_PER_LONG > 32
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 1aa2dc9..33305f6 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1202,6 +1202,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features)
 		       SKB_GSO_UDP |
 		       SKB_GSO_DODGY |
 		       SKB_GSO_TCP_ECN |
+		       SKB_GSO_TCP_URG |
 		       0)))
 		goto out;
 
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c5aca0b..e6796a8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2383,6 +2383,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
 	__be32 delta;
 	unsigned int oldlen;
 	unsigned int len;
+	unsigned int urg_ptr;
 
 	if (!pskb_may_pull(skb, sizeof(*th)))
 		goto out;
@@ -2408,6 +2409,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
 			       SKB_GSO_DODGY |
 			       SKB_GSO_TCP_ECN |
 			       SKB_GSO_TCPV6 |
+			       SKB_GSO_TCP_URG |
 			       0) ||
 			     !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))))
 			goto out;
@@ -2429,6 +2431,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
 	skb = segs;
 	th = tcp_hdr(skb);
 	seq = ntohl(th->seq);
+	urg_ptr = th->urg ? ntohs(th->urg_ptr) : 0;
 
 	do {
 		th->fin = th->psh = 0;
@@ -2446,6 +2449,14 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
 
 		th->seq = htonl(seq);
 		th->cwr = 0;
+
+		if (urg_ptr <= len)
+			urg_ptr = 0;
+		else
+			urg_ptr -= len;
+
+		th->urg = !!urg_ptr;
+		th->urg_ptr = htons(urg_ptr);
 	} while (skb->next);
 
 	delta = htonl(oldlen + (skb->tail - skb->transport_header) +
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fe3b4bd..d4dc0fa 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -667,6 +667,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 		     between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) {
 		th->urg_ptr		= htons(tp->snd_up - tcb->seq);
 		th->urg			= 1;
+		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_URG;
 	}
 
 	tcp_options_write((__be32 *)(th + 1), tp, &opts, &md5_hash_location);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 01edac8..c11d3d9 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -746,6 +746,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, int features)
 		       SKB_GSO_DODGY |
 		       SKB_GSO_TCP_ECN |
 		       SKB_GSO_TCPV6 |
+		       SKB_GSO_TCP_URG |
 		       0)))
 		goto out;
 

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17  1:18 ` Herbert Xu
@ 2008-12-17  8:15   ` Ilpo Järvinen
  2008-12-17 10:13     ` Herbert Xu
  2008-12-17  8:18   ` Petr Tesarik
  1 sibling, 1 reply; 47+ messages in thread
From: Ilpo Järvinen @ 2008-12-17  8:15 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, Netdev, ptesarik

On Wed, 17 Dec 2008, Herbert Xu wrote:

> Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote:
> > 
> > I should have noticed this earlier... :-) The previous solution
> > to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
> > patterns, or even worse, a steep downward slope into packet
> > counting because each skb pcount would be truncated to pcount
> > of 2 and then the following fragments of the later portion would
> > restore the window again.
> 
> How about just handling it, URG isn't that hard.  If anyone is
> bored you can do this for GRO as well.
>
> tcp: Add GSO support for urgent data
> 
> When segmenting packets with urgent data we need to clear the
> urgent flag/pointer once we move past the end of the urgent data.
> This was not handled correctly in GSO or (presumably) in existing
> hardware.
>
> This patch adds a new GSO flag for TCP urgent data and also fixes
> its handling in GSO.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

This is certainly nice and possible but as mention earlier, the harder 
problem is that the urg_ptr is only 16-bits so there will be still be 
problem if the urgent seq-64k mark is within a super-skb as tail segments 
need urg while the head ones won't. I think we would need out-of-band 
means to communicate the extra bit (17th) from tcp if we want to solve 
that one. It would be rather easy to change TCP to only do splitting for 
that particular skb though and keep everything else intact.

...The same consideration applies to gro as well.


-- 
 i.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17  1:18 ` Herbert Xu
  2008-12-17  8:15   ` Ilpo Järvinen
@ 2008-12-17  8:18   ` Petr Tesarik
  2008-12-17  8:42     ` Ilpo Järvinen
  2008-12-17 10:17     ` Herbert Xu
  1 sibling, 2 replies; 47+ messages in thread
From: Petr Tesarik @ 2008-12-17  8:18 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Ilpo Järvinen, davem, netdev

Herbert Xu píše v St 17. 12. 2008 v 12:18 +1100:
> Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote:
> > 
> > I should have noticed this earlier... :-) The previous solution
> > to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
> > patterns, or even worse, a steep downward slope into packet
> > counting because each skb pcount would be truncated to pcount
> > of 2 and then the following fragments of the later portion would
> > restore the window again.
> 
> How about just handling it, URG isn't that hard.  If anyone is
> bored you can do this for GRO as well.
> 
> tcp: Add GSO support for urgent data
> 
> When segmenting packets with urgent data we need to clear the
> urgent flag/pointer once we move past the end of the urgent data.
> This was not handled correctly in GSO or (presumably) in existing
> hardware.
> 
> This patch adds a new GSO flag for TCP urgent data and also fixes
> its handling in GSO.

This would work, but:

  1. Currently this code path will never get run, because large
     offloads are avoided both for hardware TSO and software GSO.
     You'd also have to modify the conditionals that guard calls
     to tcp_mss_split_point().

  2. Is it worth the trouble?
     This slows down the software GSO slightly. I mean, urgent
     data should be quite rare, so it makes sense to treat
     non-urgent data as "fast path" and urgent data as "slow
     path". While Ilpo's approach only slows down the slow path,
     your patch slows down both.

     No, I don't have any numbers, no. ;)

Anyway, we have to keep the code that avoids large offloads with URG,
because many NICs are broken in this regard (at least all Intel NICs
I've seen so far), so why treat software GSO specially?

Petr Tesarik

> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e26f549..4508c48 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -532,9 +532,11 @@ struct net_device
>  #define NETIF_F_GSO_ROBUST	(SKB_GSO_DODGY << NETIF_F_GSO_SHIFT)
>  #define NETIF_F_TSO_ECN		(SKB_GSO_TCP_ECN << NETIF_F_GSO_SHIFT)
>  #define NETIF_F_TSO6		(SKB_GSO_TCPV6 << NETIF_F_GSO_SHIFT)
> +#define NETIF_F_TSO_URG		(SKB_GSO_TCP_URG << NETIF_F_GSO_SHIFT)
>  
>  	/* List of features with software fallbacks. */
> -#define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6)
> +#define NETIF_F_GSO_SOFTWARE	(NETIF_F_TSO | NETIF_F_TSO_ECN | \
> +				 NETIF_F_TSO6 | NETIF_F_TSO_URG)
>  
> 
>  #define NETIF_F_GEN_CSUM	(NETIF_F_NO_CSUM | NETIF_F_HW_CSUM)
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2725f4e..56e9179 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -188,6 +188,9 @@ enum {
>  	SKB_GSO_TCP_ECN = 1 << 3,
>  
>  	SKB_GSO_TCPV6 = 1 << 4,
> +
> +	/* This indicates the tcp segment has URG set. */
> +	SKB_GSO_TCP_URG = 1 << 5,
>  };
>  
>  #if BITS_PER_LONG > 32
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 1aa2dc9..33305f6 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1202,6 +1202,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb, int features)
>  		       SKB_GSO_UDP |
>  		       SKB_GSO_DODGY |
>  		       SKB_GSO_TCP_ECN |
> +		       SKB_GSO_TCP_URG |
>  		       0)))
>  		goto out;
>  
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index c5aca0b..e6796a8 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2383,6 +2383,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
>  	__be32 delta;
>  	unsigned int oldlen;
>  	unsigned int len;
> +	unsigned int urg_ptr;
>  
>  	if (!pskb_may_pull(skb, sizeof(*th)))
>  		goto out;
> @@ -2408,6 +2409,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
>  			       SKB_GSO_DODGY |
>  			       SKB_GSO_TCP_ECN |
>  			       SKB_GSO_TCPV6 |
> +			       SKB_GSO_TCP_URG |
>  			       0) ||
>  			     !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))))
>  			goto out;
> @@ -2429,6 +2431,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
>  	skb = segs;
>  	th = tcp_hdr(skb);
>  	seq = ntohl(th->seq);
> +	urg_ptr = th->urg ? ntohs(th->urg_ptr) : 0;
>  
>  	do {
>  		th->fin = th->psh = 0;
> @@ -2446,6 +2449,14 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb, int features)
>  
>  		th->seq = htonl(seq);
>  		th->cwr = 0;
> +
> +		if (urg_ptr <= len)
> +			urg_ptr = 0;
> +		else
> +			urg_ptr -= len;
> +
> +		th->urg = !!urg_ptr;
> +		th->urg_ptr = htons(urg_ptr);
>  	} while (skb->next);
>  
>  	delta = htonl(oldlen + (skb->tail - skb->transport_header) +
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index fe3b4bd..d4dc0fa 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -667,6 +667,7 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
>  		     between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) {
>  		th->urg_ptr		= htons(tp->snd_up - tcb->seq);
>  		th->urg			= 1;
> +		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_URG;
>  	}
>  
>  	tcp_options_write((__be32 *)(th + 1), tp, &opts, &md5_hash_location);
> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> index 01edac8..c11d3d9 100644
> --- a/net/ipv6/af_inet6.c
> +++ b/net/ipv6/af_inet6.c
> @@ -746,6 +746,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, int features)
>  		       SKB_GSO_DODGY |
>  		       SKB_GSO_TCP_ECN |
>  		       SKB_GSO_TCPV6 |
> +		       SKB_GSO_TCP_URG |
>  		       0)))
>  		goto out;
>  


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17  8:18   ` Petr Tesarik
@ 2008-12-17  8:42     ` Ilpo Järvinen
  2008-12-17 10:17     ` Herbert Xu
  1 sibling, 0 replies; 47+ messages in thread
From: Ilpo Järvinen @ 2008-12-17  8:42 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Herbert Xu, David Miller, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2628 bytes --]

On Wed, 17 Dec 2008, Petr Tesarik wrote:

> Herbert Xu píse v St 17. 12. 2008 v 12:18 +1100:
> > Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote:
> > > 
> > > I should have noticed this earlier... :-) The previous solution
> > > to URG+GSO/TSO will cause SACK block tcp_fragment to do zig-zig
> > > patterns, or even worse, a steep downward slope into packet
> > > counting because each skb pcount would be truncated to pcount
> > > of 2 and then the following fragments of the later portion would
> > > restore the window again.
> > 
> > How about just handling it, URG isn't that hard.  If anyone is
> > bored you can do this for GRO as well.
> > 
> > tcp: Add GSO support for urgent data
> > 
> > When segmenting packets with urgent data we need to clear the
> > urgent flag/pointer once we move past the end of the urgent data.
> > This was not handled correctly in GSO or (presumably) in existing
> > hardware.
> > 
> > This patch adds a new GSO flag for TCP urgent data and also fixes
> > its handling in GSO.
> 
> This would work, but:
> 
>   1. Currently this code path will never get run, because large
>      offloads are avoided both for hardware TSO and software GSO.
>      You'd also have to modify the conditionals that guard calls
>      to tcp_mss_split_point().
> 
>   2. Is it worth the trouble?
>      This slows down the software GSO slightly. I mean, urgent
>      data should be quite rare, so it makes sense to treat
>      non-urgent data as "fast path" and urgent data as "slow
>      path". While Ilpo's approach only slows down the slow path,
>      your patch slows down both.

My intention is to make it slightly more fine-grained on tcp side still, 
so that affected portion only includes those segments that really reside 
within 64k from the urg seq and rest will be unaffected.

>      No, I don't have any numbers, no. ;)

To put this in contrast (an example for elsewhere)... Once we got any
SACK blocks, we were doing tcp_fragment for almost every incoming segment, 
in practice for the whole window (I made it a lot better in net-next
though still some tcp_fragment()s will be needed), and that send-used 
fragment calls for tso_fragment which is less heavy-weight than sack-used 
tcp_fragment.

> Anyway, we have to keep the code that avoids large offloads with URG,
> because many NICs are broken in this regard (at least all Intel NICs
> I've seen so far), so why treat software GSO specially?

All TSO implementation will be subject to the other problem I mentioned in 
my response to Herbert. There just isn't enough bits for them to do 
things right w.r.t. 64k-before point.

-- 
 i.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17  8:15   ` Ilpo Järvinen
@ 2008-12-17 10:13     ` Herbert Xu
  2008-12-17 10:31       ` Herbert Xu
  0 siblings, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 10:13 UTC (permalink / raw)
  To: Ilpo J??rvinen; +Cc: herbert, davem, netdev, ptesarik

Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote:
> 
> This is certainly nice and possible but as mention earlier, the harder 
> problem is that the urg_ptr is only 16-bits so there will be still be 
> problem if the urgent seq-64k mark is within a super-skb as tail segments 
> need urg while the head ones won't. I think we would need out-of-band 
> means to communicate the extra bit (17th) from tcp if we want to solve 
> that one. It would be rather easy to change TCP to only do splitting for 
> that particular skb though and keep everything else intact.

I think you've got it the wrong way around.  The test for setting
urg_ptr is:

        if (unlikely(tcp_urg_mode(tp) &&
                     between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) {

So if this is true for the super-packet, then by definition it is
also true for the first packet.  In fact, it will be true for the
first n packets where

	tcb->seq + n * mss + 1 <= tp->snd_up

The only time when this will fall apart is if we had super-packets
bigger than 64K.  Fortunately or unfortunately that is a long way
off.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17  8:18   ` Petr Tesarik
  2008-12-17  8:42     ` Ilpo Järvinen
@ 2008-12-17 10:17     ` Herbert Xu
  2008-12-17 10:45       ` Petr Tesarik
  1 sibling, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 10:17 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: herbert, ilpo.jarvinen, davem, netdev

Petr Tesarik <ptesarik@suse.cz> wrote:
> 
>  1. Currently this code path will never get run, because large
>     offloads are avoided both for hardware TSO and software GSO.
>     You'd also have to modify the conditionals that guard calls
>     to tcp_mss_split_point().

Obviously one would revert the existing fixes after this is applied.

>  2. Is it worth the trouble?
>     This slows down the software GSO slightly. I mean, urgent
>     data should be quite rare, so it makes sense to treat
>     non-urgent data as "fast path" and urgent data as "slow
>     path". While Ilpo's approach only slows down the slow path,
>     your patch slows down both.

Well it might slow down the GSO path slightly, but it speeds
up the TSO generation path accordingly :) I think the cost is
negligible in either case so the benefit of TSO dominates.
 
> Anyway, we have to keep the code that avoids large offloads with URG,
> because many NICs are broken in this regard (at least all Intel NICs
> I've seen so far), so why treat software GSO specially?

That's not a problem.  Unless the NIC sets NETIF_F_TSO_URG (which
nobody does currently because it's a new flag), we'll do the
segmentation in software, but only if the packet is marked as URG.

This is identical to how we handle ECN which most NICs get wrong
as well.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 10:13     ` Herbert Xu
@ 2008-12-17 10:31       ` Herbert Xu
  2008-12-17 10:42         ` Herbert Xu
  0 siblings, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 10:31 UTC (permalink / raw)
  To: Ilpo J??rvinen; +Cc: davem, netdev, ptesarik

On Wed, Dec 17, 2008 at 09:13:22PM +1100, Herbert Xu wrote:
> Ilpo J??rvinen <ilpo.jarvinen@helsinki.fi> wrote:
> > 
> > This is certainly nice and possible but as mention earlier, the harder 
> > problem is that the urg_ptr is only 16-bits so there will be still be 
> > problem if the urgent seq-64k mark is within a super-skb as tail segments 
> > need urg while the head ones won't. I think we would need out-of-band 
> > means to communicate the extra bit (17th) from tcp if we want to solve 
> > that one. It would be rather easy to change TCP to only do splitting for 
> > that particular skb though and keep everything else intact.
> 
> I think you've got it the wrong way around.  The test for setting
> urg_ptr is:

Oh I see what you mean now.  Yes this isn't as easy as it appears.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 10:31       ` Herbert Xu
@ 2008-12-17 10:42         ` Herbert Xu
  2008-12-17 10:52           ` Petr Tesarik
  0 siblings, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 10:42 UTC (permalink / raw)
  To: Ilpo J??rvinen; +Cc: davem, netdev, ptesarik

On Wed, Dec 17, 2008 at 09:31:29PM +1100, Herbert Xu wrote:
>
> Oh I see what you mean now.  Yes this isn't as easy as it appears.

Reading RFC 793 again, it seems that what we're doing may not be
correct.  It would appear that the intention is to flag urgent mode
as soon as urgent data appears from the user, regardless of whether
it's 64K or more beyond the current sequence number.

The fact that the pointer is 16 bits long is not an issue.  We
should setting it to 0xffff.

The way we do it currently means that if there is loads of data
in the pipe then we don't signal urgent mode at all until we're
within 64K of snd.up, which defeats the whole point of urgent
mode.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 10:17     ` Herbert Xu
@ 2008-12-17 10:45       ` Petr Tesarik
  2008-12-17 10:47         ` Herbert Xu
  0 siblings, 1 reply; 47+ messages in thread
From: Petr Tesarik @ 2008-12-17 10:45 UTC (permalink / raw)
  To: Herbert Xu; +Cc: ilpo.jarvinen, davem, netdev

Herbert Xu píše v St 17. 12. 2008 v 21:17 +1100:
> Petr Tesarik <ptesarik@suse.cz> wrote:
> > 
> >  1. Currently this code path will never get run, because large
> >     offloads are avoided both for hardware TSO and software GSO.
> >     You'd also have to modify the conditionals that guard calls
> >     to tcp_mss_split_point().
> 
> Obviously one would revert the existing fixes after this is applied.
> 
> >  2. Is it worth the trouble?
> >     This slows down the software GSO slightly. I mean, urgent
> >     data should be quite rare, so it makes sense to treat
> >     non-urgent data as "fast path" and urgent data as "slow
> >     path". While Ilpo's approach only slows down the slow path,
> >     your patch slows down both.
> 
> Well it might slow down the GSO path slightly, but it speeds
> up the TSO generation path accordingly :) I think the cost is
> negligible in either case so the benefit of TSO dominates.

You're talking about the urgent-data case, right? Yes, for urgent data,
correctly implemented GSO will no doubt be faster than splitting the
skb. But GSO will also get (a bit) slower for the non-urgent-data case,
where the slowdown is not compensated by anything. Or did I miss
something obvious?

Petr Tesarik
 
> > Anyway, we have to keep the code that avoids large offloads with URG,
> > because many NICs are broken in this regard (at least all Intel NICs
> > I've seen so far), so why treat software GSO specially?
> 
> That's not a problem.  Unless the NIC sets NETIF_F_TSO_URG (which
> nobody does currently because it's a new flag), we'll do the
> segmentation in software, but only if the packet is marked as URG.
> 
> This is identical to how we handle ECN which most NICs get wrong
> as well.
> 
> Cheers,


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 10:45       ` Petr Tesarik
@ 2008-12-17 10:47         ` Herbert Xu
  0 siblings, 0 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 10:47 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: ilpo.jarvinen, davem, netdev

On Wed, Dec 17, 2008 at 11:45:13AM +0100, Petr Tesarik wrote:
>
> You're talking about the urgent-data case, right? Yes, for urgent data,
> correctly implemented GSO will no doubt be faster than splitting the
> skb. But GSO will also get (a bit) slower for the non-urgent-data case,
> where the slowdown is not compensated by anything. Or did I miss
> something obvious?

Unless you've got numbers I'm not buying this :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 10:42         ` Herbert Xu
@ 2008-12-17 10:52           ` Petr Tesarik
  2008-12-17 11:26             ` Herbert Xu
  0 siblings, 1 reply; 47+ messages in thread
From: Petr Tesarik @ 2008-12-17 10:52 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Ilpo J??rvinen, davem, netdev

Herbert Xu píše v St 17. 12. 2008 v 21:42 +1100:
> On Wed, Dec 17, 2008 at 09:31:29PM +1100, Herbert Xu wrote:
> >
> > Oh I see what you mean now.  Yes this isn't as easy as it appears.
> 
> Reading RFC 793 again, it seems that what we're doing may not be
> correct.  It would appear that the intention is to flag urgent mode
> as soon as urgent data appears from the user, regardless of whether
> it's 64K or more beyond the current sequence number.
> 
> The fact that the pointer is 16 bits long is not an issue.  We
> should setting it to 0xffff.

You might read it this way, but this would have some undesired
consequences in practise. Both Linux and *BSD interprets the offset as a
pointer to the first byte after the "out-of-band" data. So, they remove
the immediately preceding byte from the TCP stream (unless you set
MSG_OOB_INLINE, but that's not on by default).

Additionally, whenever the urgent pointer (the absolute offset in the
stream) changes, Linux (and maybe also *BSD -- not tested) sends a
SIGURG to the process.

In short, if you keep urg_ptr at 0xffff while moving forward in the
stream, the receiving side will interpret it as multiple urgent data.
Not quite what you want, I think.

Petr Tesarik

> The way we do it currently means that if there is loads of data
> in the pipe then we don't signal urgent mode at all until we're
> within 64K of snd.up, which defeats the whole point of urgent
> mode.
> 
> Cheers,


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 10:52           ` Petr Tesarik
@ 2008-12-17 11:26             ` Herbert Xu
  2008-12-17 11:41               ` Herbert Xu
  2008-12-17 23:23               ` David Miller
  0 siblings, 2 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 11:26 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Ilpo J??rvinen, davem, netdev

On Wed, Dec 17, 2008 at 11:52:23AM +0100, Petr Tesarik wrote:
> 
> In short, if you keep urg_ptr at 0xffff while moving forward in the
> stream, the receiving side will interpret it as multiple urgent data.
> Not quite what you want, I think.

I just checked NetBSD and guess what, they do exactly what I've
suggested:

	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
		u_int32_t urp = tp->snd_up - tp->snd_nxt;
		if (urp > IP_MAXPACKET)
			urp = IP_MAXPACKET;
		th->th_urp = htons((u_int16_t)urp);
		th->th_flags |= TH_URG;
	} else  

So if you think this is bad then it's been there for decades :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 11:26             ` Herbert Xu
@ 2008-12-17 11:41               ` Herbert Xu
  2008-12-17 12:55                 ` Alexey Kuznetsov
  2008-12-17 23:23               ` David Miller
  1 sibling, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 11:41 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Ilpo J??rvinen, davem, netdev, Alexey Kuznetsov

On Wed, Dec 17, 2008 at 10:26:00PM +1100, Herbert Xu wrote:
>
> I just checked NetBSD and guess what, they do exactly what I've
> suggested:

Actually the code quoted is from OpenBSD but I just checked
again and NetBSD does the same thing, while FreeBSD doesn't
check for overflows at all (Oops!).

> 	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
> 		u_int32_t urp = tp->snd_up - tp->snd_nxt;
> 		if (urp > IP_MAXPACKET)
> 			urp = IP_MAXPACKET;
> 		th->th_urp = htons((u_int16_t)urp);
> 		th->th_flags |= TH_URG;
> 	} else  
> 
> So if you think this is bad then it's been there for decades :)

In case you still think you can rely on the urgent behaviour
you should check out this document:

http://tools.ietf.org/html/draft-gont-tcpm-urgent-data-00

So I think given all of the above setting the urg_ptr to 0xffff
is a good compromise.  Yes it does have the downside that the
recipient may misbehave by taking things out-of-band but an app
that doesn't deal with this is broken anyway since that's exactly
what'll happen if you run it under BSD.  On the plus side, we'll
actually signal urgent mode as soon as possible which is about
the only sensible thing to do for urgent data.

In other words what we do currently potentially makes urgent
mode completely useless since it may delay the sending of the
urgent notification indefinitely.  While the BSD behaviour would
only break "broken" applications.

Anyway, here's the patch:

tcp: Set urgent pointer even if we're more than 64K from the end

Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from NetBSD and OpenBSD, and clearly
contradicts the purpose of urgent mode, which is to send the
notification (though not necessarily the associated data) as soon
as possible.  Our current behaviour may in fact delay the urgent
message indefinitely if the receiver window does not open up.

This patch modifies our behaviour to align with that of the BSDs.

The potential downside is that applications that does not cope
with out-of-band data correctly may be confused by originally
in-band stuff going out-of-band.  However, such applications
are seriously broken anyway because this can occur when it runs
on any BSD or if urgent notifications are merged.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fe3b4bd..2964c77 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -663,10 +663,12 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	th->urg_ptr		= 0;
 
 	/* The urg_mode check is necessary during a below snd_una win probe */
-	if (unlikely(tcp_urg_mode(tp) &&
-		     between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) {
-		th->urg_ptr		= htons(tp->snd_up - tcb->seq);
+	if (unlikely(tcp_urg_mode(tp))) {
 		th->urg			= 1;
+		if (between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))
+			th->urg_ptr = htons(tp->snd_up - tcb->seq);
+		else
+			th->urg_ptr = 0xFFFF;
 	}
 
 	tcp_options_write((__be32 *)(th + 1), tp, &opts, &md5_hash_location);
 
Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 11:41               ` Herbert Xu
@ 2008-12-17 12:55                 ` Alexey Kuznetsov
  2008-12-17 13:31                   ` Petr Tesarik
                                     ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: Alexey Kuznetsov @ 2008-12-17 12:55 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Petr Tesarik, Ilpo J??rvinen, davem, netdev

Hello!

> In other words what we do currently potentially makes urgent
> mode completely useless since it may delay the sending of the
> urgent notification indefinitely.  While the BSD behaviour would
> only break "broken" applications.

This is true.

It definitely will work with correct application (those which
use SO_OOBINLINE), and surely will kill those which treat
urg data in _default_ __BSD__("BSD behaviour", you say, huh:-))
way and withdraw octet at urgent pointer. 

Probably, netbsd/openbsd have some another quirk to work this around.
I do not see how, though.

I decided that this is a fatal conflict, which can be solved cleanly
only in a painful way like you suggested. Current approach delaying
URG is the way which does not spoil any data, breaking only in the
case of aborts.

If you are serious about this, also it would be a good idea
to make SO_OOBINLINE default option. Consider this. At least this
will allow to trap applications which will corrupt data stream
earlier.

Alexey

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 12:55                 ` Alexey Kuznetsov
@ 2008-12-17 13:31                   ` Petr Tesarik
  2008-12-17 14:21                     ` Alexey Kuznetsov
  2008-12-17 15:30                   ` Alexey Kuznetsov
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Petr Tesarik @ 2008-12-17 13:31 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Herbert Xu, Ilpo J??rvinen, davem, netdev

Alexey Kuznetsov píše v St 17. 12. 2008 v 15:55 +0300:
> Hello!
> 
> > In other words what we do currently potentially makes urgent
> > mode completely useless since it may delay the sending of the
> > urgent notification indefinitely.  While the BSD behaviour would
> > only break "broken" applications.
> 
> This is true.
> 
> It definitely will work with correct application (those which
> use SO_OOBINLINE), and surely will kill those which treat
> urg data in _default_ __BSD__("BSD behaviour", you say, huh:-))
> way and withdraw octet at urgent pointer. 
> 
> Probably, netbsd/openbsd have some another quirk to work this around.
> I do not see how, though.
> 
> I decided that this is a fatal conflict, which can be solved cleanly
> only in a painful way like you suggested. Current approach delaying
> URG is the way which does not spoil any data, breaking only in the
> case of aborts.
> 
> If you are serious about this, also it would be a good idea
> to make SO_OOBINLINE default option. Consider this. At least this
> will allow to trap applications which will corrupt data stream
> earlier.

This might turn out to be a dangerous decision. I am aware at least of
one big application which uses urgent data and doesn't use SO_OOBINLINE:
Oracle.

Yes, of course it's broken, but do we really want to join the much-hated
folks who are stuck with an only-we-do-it-correctly approach?

Just my 2 cents,
Petr Tesarik



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 13:31                   ` Petr Tesarik
@ 2008-12-17 14:21                     ` Alexey Kuznetsov
  0 siblings, 0 replies; 47+ messages in thread
From: Alexey Kuznetsov @ 2008-12-17 14:21 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Herbert Xu, Ilpo J??rvinen, davem, netdev

Hello!

> This might turn out to be a dangerous decision. I am aware at least of
> one big application which uses urgent data and doesn't use SO_OOBINLINE:
> Oracle.

Yes, this is not some "rlogin"...


> Yes, of course it's broken, but do we really want to join the much-hated
> folks who are stuck with an only-we-do-it-correctly approach?

Well, we do not want to return to that company. :-)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 12:55                 ` Alexey Kuznetsov
  2008-12-17 13:31                   ` Petr Tesarik
@ 2008-12-17 15:30                   ` Alexey Kuznetsov
  2008-12-17 16:04                     ` Petr Tesarik
  2008-12-17 21:29                     ` Herbert Xu
  2008-12-17 21:27                   ` Herbert Xu
  2008-12-17 22:16                   ` Herbert Xu
  3 siblings, 2 replies; 47+ messages in thread
From: Alexey Kuznetsov @ 2008-12-17 15:30 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Petr Tesarik, Ilpo J??rvinen, davem, netdev

Hello!

> Probably, netbsd/openbsd have some another quirk to work this around.
> I do not see how, though.

F.e. this could be totally safe if we do this only in the case
when octet seq+0xFFFF has not yet been sent.

When we send segment with 0xFFFF inside urg_ptr is advanced and so on,
so that receiver has no chances to corrupt stream.

I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:

	after(scb->seq + 0xFFFF, tp->snd_nxt)

My brains are rusty, so take this critically. :-)



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 15:30                   ` Alexey Kuznetsov
@ 2008-12-17 16:04                     ` Petr Tesarik
  2008-12-17 21:30                       ` Herbert Xu
  2008-12-17 21:29                     ` Herbert Xu
  1 sibling, 1 reply; 47+ messages in thread
From: Petr Tesarik @ 2008-12-17 16:04 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Herbert Xu, Ilpo J??rvinen, davem, netdev

Alexey Kuznetsov píše v St 17. 12. 2008 v 18:30 +0300:
> Hello!
> 
> > Probably, netbsd/openbsd have some another quirk to work this around.
> > I do not see how, though.
> 
> F.e. this could be totally safe if we do this only in the case
> when octet seq+0xFFFF has not yet been sent.
> 
> When we send segment with 0xFFFF inside urg_ptr is advanced and so on,
> so that receiver has no chances to corrupt stream.
> 
> I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:
> 
> 	after(scb->seq + 0xFFFF, tp->snd_nxt)
> 
> My brains are rusty, so take this critically. :-)

This is all nice, but it still does not solve those series of SIGURGs on
the receiving side. My suggestion is to not generate a new SIGURG until
the data for the latest one have arrived. There can be only one byte of
urgent data, so if somebody sends more than one, it cannot be handled by
the receiver, anyway, so that use case is broken and need not be taken
into account.

Oh, and since SIGURG is not a real-time signal, there is no requirement
to send as many of them as there were urgent data. Of course, this does
not help other OSes, but I'm not sure how they behave in these
situations.

Petr


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 12:55                 ` Alexey Kuznetsov
  2008-12-17 13:31                   ` Petr Tesarik
  2008-12-17 15:30                   ` Alexey Kuznetsov
@ 2008-12-17 21:27                   ` Herbert Xu
  2008-12-17 22:16                   ` Herbert Xu
  3 siblings, 0 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 21:27 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: herbert, ptesarik, ilpo.jarvinen, davem, netdev

Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> wrote:
> 
> If you are serious about this, also it would be a good idea
> to make SO_OOBINLINE default option. Consider this. At least this
> will allow to trap applications which will corrupt data stream
> earlier.

Well enabling SO_OOBINLINE would contradict POSIX.  Much as I'd
like to have it on by default, I don't think it's really worth it.

In any case such applications will break right now because URG
notifications can be merged due to out-of-order delivery, which
would cause OOB data to go in-band.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 15:30                   ` Alexey Kuznetsov
  2008-12-17 16:04                     ` Petr Tesarik
@ 2008-12-17 21:29                     ` Herbert Xu
  2008-12-17 22:52                       ` Herbert Xu
  1 sibling, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 21:29 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Petr Tesarik, Ilpo J??rvinen, davem, netdev

On Wed, Dec 17, 2008 at 06:30:15PM +0300, Alexey Kuznetsov wrote:
> Hello!
> 
> > Probably, netbsd/openbsd have some another quirk to work this around.
> > I do not see how, though.
> 
> F.e. this could be totally safe if we do this only in the case
> when octet seq+0xFFFF has not yet been sent.
> 
> When we send segment with 0xFFFF inside urg_ptr is advanced and so on,
> so that receiver has no chances to corrupt stream.
> 
> I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:
> 
> 	after(scb->seq + 0xFFFF, tp->snd_nxt)
> 
> My brains are rusty, so take this critically. :-)

Indeed that's a great idea!

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 16:04                     ` Petr Tesarik
@ 2008-12-17 21:30                       ` Herbert Xu
  2008-12-18  8:58                         ` Petr Tesarik
  0 siblings, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 21:30 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Alexey Kuznetsov, Ilpo J??rvinen, davem, netdev

On Wed, Dec 17, 2008 at 05:04:48PM +0100, Petr Tesarik wrote:
>
> This is all nice, but it still does not solve those series of SIGURGs on
> the receiving side. My suggestion is to not generate a new SIGURG until
> the data for the latest one have arrived. There can be only one byte of
> urgent data, so if somebody sends more than one, it cannot be handled by
> the receiver, anyway, so that use case is broken and need not be taken
> into account.

No we need to send a SIGURG as soon as we enter urgent mode.  The
whole point of urgent mode is to inform the receiver ASAP.  As the
urgent data itself cannot be expedited due to the in-order nature
of TCP, setting the flag and sending the signal is the best we can
do.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 12:55                 ` Alexey Kuznetsov
                                     ` (2 preceding siblings ...)
  2008-12-17 21:27                   ` Herbert Xu
@ 2008-12-17 22:16                   ` Herbert Xu
  2008-12-17 23:17                     ` David Miller
  2008-12-19 13:46                     ` Alexey Kuznetsov
  3 siblings, 2 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 22:16 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Petr Tesarik, Ilpo J??rvinen, davem, netdev

On Wed, Dec 17, 2008 at 03:55:05PM +0300, Alexey Kuznetsov wrote:
>
> Probably, netbsd/openbsd have some another quirk to work this around.
> I do not see how, though.

Indeed they do have more quirks.  I only saw it after reading it
again, but they always use snd_nxt instead of the current sequence
number:

	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
		u_int32_t urp = tp->snd_up - tp->snd_nxt;
		if (urp > IP_MAXPACKET)
			urp = IP_MAXPACKET;
		th->th_urp = htons((u_int16_t)urp);
		th->th_flags |= TH_URG;

So on the one hand this means that for a fresh transmission we
only use 0xffff when the urgent data is not in the packet.

However, for retransmissions the urgent pointer is comletely
bogus!

Again this shows that any application that's relying on the urgent
data to be out-of-band is just broken.  The only sane way to use
urgent mode is as a notification.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 21:29                     ` Herbert Xu
@ 2008-12-17 22:52                       ` Herbert Xu
  2008-12-17 23:15                         ` Herbert Xu
  2008-12-18 19:45                         ` Ilpo Järvinen
  0 siblings, 2 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 22:52 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Petr Tesarik, Ilpo J??rvinen, davem, netdev

On Thu, Dec 18, 2008 at 08:29:07AM +1100, Herbert Xu wrote:
>
> > I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:
> > 
> > 	after(scb->seq + 0xFFFF, tp->snd_nxt)
> > 
> > My brains are rusty, so take this critically. :-)
> 
> Indeed that's a great idea!

OK, I think this is what we should do:

1) Fix fresh transmissions of urgent mode as Alexey suggested.

This is the safest way of ensuring that the urgent notification
is not delayed.  This is still not as "fast" as the BSD behaviour
but it is much safer with respect to broken^H^H^H^H^H^Hlegacy
applications.

2) Continue to use the current test for retransmissions.

3) Apply my original GSO patch and revert the existing URG fixes.

With Alexey's fix, the scenario that Ilpo identified would only
occur on retransmissions.  As we disable TSO on retransmissions,
it no longer affects my patch.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 22:52                       ` Herbert Xu
@ 2008-12-17 23:15                         ` Herbert Xu
  2008-12-17 23:21                           ` David Miller
  2008-12-26  1:13                           ` David Miller
  2008-12-18 19:45                         ` Ilpo Järvinen
  1 sibling, 2 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 23:15 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Petr Tesarik, Ilpo J??rvinen, davem, netdev

On Thu, Dec 18, 2008 at 09:52:31AM +1100, Herbert Xu wrote:
> 
> 1) Fix fresh transmissions of urgent mode as Alexey suggested.
> 
> This is the safest way of ensuring that the urgent notification
> is not delayed.  This is still not as "fast" as the BSD behaviour
> but it is much safer with respect to broken^H^H^H^H^H^Hlegacy
> applications.

tcp: Always set urgent pointer if it's beyond snd_nxt

Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.

Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.

Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted.  This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.

The only potential downside is that applications on the remote
end may see multiple SIGURG notifications.  However, this would
occur anyway with other TCP stacks.  More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fe3b4bd..e6e0319 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -663,10 +663,14 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 	th->urg_ptr		= 0;
 
 	/* The urg_mode check is necessary during a below snd_una win probe */
-	if (unlikely(tcp_urg_mode(tp) &&
-		     between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF))) {
-		th->urg_ptr		= htons(tp->snd_up - tcb->seq);
-		th->urg			= 1;
+	if (unlikely(tcp_urg_mode(tp))) {
+		if (between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF)) {
+			th->urg_ptr = htons(tp->snd_up - tcb->seq);
+			th->urg = 1;
+		} else if (after(tcb->seq + 0xFFFF, tp->snd_nxt)) {
+			th->urg_ptr = 0xFFFF;
+			th->urg = 1;
+		}
 	}
 
 	tcp_options_write((__be32 *)(th + 1), tp, &opts, &md5_hash_location);

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 22:16                   ` Herbert Xu
@ 2008-12-17 23:17                     ` David Miller
  2008-12-17 23:25                       ` Herbert Xu
  2008-12-18 20:07                       ` Ilpo Järvinen
  2008-12-19 13:46                     ` Alexey Kuznetsov
  1 sibling, 2 replies; 47+ messages in thread
From: David Miller @ 2008-12-17 23:17 UTC (permalink / raw)
  To: herbert; +Cc: kuznet, ptesarik, ilpo.jarvinen, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 18 Dec 2008 09:16:37 +1100

> However, for retransmissions the urgent pointer is comletely
> bogus!
> 
> Again this shows that any application that's relying on the urgent
> data to be out-of-band is just broken.  The only sane way to use
> urgent mode is as a notification.

I don't think this is a completely fair assesment, to be
honest.

Showing that the BSD guys have some half-working hacks in
this area is pretty immaterial, in the grand scheme of things.
The fact that retransmits will corrupt the URG pointer is
merely an indication of how few people use BSD and URG in
this combination of circumstances.

Nothing more.

And it also shows that our strange scheme, where we don't
signal URG until it's within the 16-bit sequence offset
limit, at least reports an accurate URG value.

Also, what if we're using IPV6 jumbo frames and we advertise this
special 0xffff URG offset turd?  That points to a real byte in the
packet, but it might not be the right one since we can only accurately
point up to 0xffff into there.

In TCP/IP Volume 2, the original BSD code is even more ignorant of
this situation, it simply sets the difference into the field as-is,
not being mindful at all of overflow:

	ti->ti_urp = htons((u_short) (tp->snd_up - tp->snd_nxt));

which probably prompted the workaround you see in the BSD code these
days.  Which, as we've seen, it buggy.

Given all of this I still think our current compromise is likely the
best one.  We never will advertise an URG pointer that is not pointing
to where the URG data will be in the sequence space.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 23:15                         ` Herbert Xu
@ 2008-12-17 23:21                           ` David Miller
  2008-12-17 23:31                             ` Herbert Xu
  2008-12-26  1:13                           ` David Miller
  1 sibling, 1 reply; 47+ messages in thread
From: David Miller @ 2008-12-17 23:21 UTC (permalink / raw)
  To: herbert; +Cc: kuznet, ptesarik, ilpo.jarvinen, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 18 Dec 2008 10:15:43 +1100

> +		if (between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF)) {
> +			th->urg_ptr = htons(tp->snd_up - tcb->seq);
> +			th->urg = 1;
> +		} else if (after(tcb->seq + 0xFFFF, tp->snd_nxt)) {
> +			th->urg_ptr = 0xFFFF;
> +			th->urg = 1;
> +		}

What does this make happen for a jumbo frame where the urgent pointer
is within this packet, yet is beyond the 0xffff offset limit?

I think this can't be used.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 11:26             ` Herbert Xu
  2008-12-17 11:41               ` Herbert Xu
@ 2008-12-17 23:23               ` David Miller
  1 sibling, 0 replies; 47+ messages in thread
From: David Miller @ 2008-12-17 23:23 UTC (permalink / raw)
  To: herbert; +Cc: ptesarik, ilpo.jarvinen, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 17 Dec 2008 22:26:00 +1100

> 	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
> 		u_int32_t urp = tp->snd_up - tp->snd_nxt;
> 		if (urp > IP_MAXPACKET)
> 			urp = IP_MAXPACKET;
> 		th->th_urp = htons((u_int16_t)urp);
> 		th->th_flags |= TH_URG;
> 	} else  
> 
> So if you think this is bad then it's been there for decades :)

No, this is a hack the BSD guys added.

This code does not appear in Richard Steven's TCP/IP Illustrated
Volume 2, as I showed in my other reply.

And bad or not, you cannot compare exposure and deployment of
the BSD stack compared to say our's or Microsoft's.  Especially
in such the unusual circumstance of URG with the URG pointer
outside of the 16-bit offset limit.

Your quick discovery of the retransmit bug nearly proves this :)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 23:17                     ` David Miller
@ 2008-12-17 23:25                       ` Herbert Xu
  2008-12-17 23:34                         ` David Miller
  2008-12-18 20:07                       ` Ilpo Järvinen
  1 sibling, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 23:25 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, ptesarik, ilpo.jarvinen, netdev

On Wed, Dec 17, 2008 at 03:17:13PM -0800, David Miller wrote:
> 
> Also, what if we're using IPV6 jumbo frames and we advertise this
> special 0xffff URG offset turd?  That points to a real byte in the
> packet, but it might not be the right one since we can only accurately
> point up to 0xffff into there.

1) We don't support that right now.

2) Even if we did this would be broken either way because just
where do you put the urgent pointer if your packet is > 64K and
the urgent pointer lies in the packet but beyond 64K to the start?

In any case we can easily fix this up with an additional check
so I don't think it's central to the issue overall.

> Given all of this I still think our current compromise is likely the
> best one.  We never will advertise an URG pointer that is not pointing
> to where the URG data will be in the sequence space.

I agree to the extent that urgent mode is hopelessly broken in
so many ways so one more isn't going to make that much of a
difference.

However, I still think that the approach suggested by Alexey is
better than the status quo:

1) It's unlikely to break any existing apps (all it does is send SIGURG
multiple times).

2) For those apps that use urgent mode in the first place, they're
likely to want to see the urgent notification ASAP, rather than having
it delayed due to the receiver being busy.  So this is an improvement
over what we have.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 23:21                           ` David Miller
@ 2008-12-17 23:31                             ` Herbert Xu
  0 siblings, 0 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 23:31 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, ptesarik, ilpo.jarvinen, netdev

On Wed, Dec 17, 2008 at 03:21:18PM -0800, David Miller wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Thu, 18 Dec 2008 10:15:43 +1100
> 
> > +		if (between(tp->snd_up, tcb->seq + 1, tcb->seq + 0xFFFF)) {
> > +			th->urg_ptr = htons(tp->snd_up - tcb->seq);
> > +			th->urg = 1;
> > +		} else if (after(tcb->seq + 0xFFFF, tp->snd_nxt)) {
> > +			th->urg_ptr = 0xFFFF;
> > +			th->urg = 1;
> > +		}
> 
> What does this make happen for a jumbo frame where the urgent pointer
> is within this packet, yet is beyond the 0xffff offset limit?

Just how on earth can we build such a packet when our skb is designed
to carry 64K maximum?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 23:25                       ` Herbert Xu
@ 2008-12-17 23:34                         ` David Miller
  2008-12-17 23:41                           ` Herbert Xu
  0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2008-12-17 23:34 UTC (permalink / raw)
  To: herbert; +Cc: kuznet, ptesarik, ilpo.jarvinen, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 18 Dec 2008 10:25:54 +1100

> 1) We don't support that right now.

I guess this is what the people writing RFC793 were thinking
as well :-)

> 2) Even if we did this would be broken either way because just
> where do you put the urgent pointer if your packet is > 64K and
> the urgent pointer lies in the packet but beyond 64K to the start?

You would need to chop up the packet to allow proper specification of
the URG pointer.

> > Given all of this I still think our current compromise is likely the
> > best one.  We never will advertise an URG pointer that is not pointing
> > to where the URG data will be in the sequence space.
> 
> I agree to the extent that urgent mode is hopelessly broken in
> so many ways so one more isn't going to make that much of a
> difference.
> 
> However, I still think that the approach suggested by Alexey is
> better than the status quo:
> 
> 1) It's unlikely to break any existing apps (all it does is send SIGURG
> multiple times).
> 
> 2) For those apps that use urgent mode in the first place, they're
> likely to want to see the urgent notification ASAP, rather than having
> it delayed due to the receiver being busy.  So this is an improvement
> over what we have.

I agree that prompt signalling of the URG condition is desirable.

None of the RFCs seem to give any real guidance in this area, and that
explains the plethora of ways we see various stacks handling this
situation.

I'll think about your patch some more, but no promises.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 23:34                         ` David Miller
@ 2008-12-17 23:41                           ` Herbert Xu
  0 siblings, 0 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-17 23:41 UTC (permalink / raw)
  To: David Miller; +Cc: kuznet, ptesarik, ilpo.jarvinen, netdev

On Wed, Dec 17, 2008 at 03:34:42PM -0800, David Miller wrote:
>
> > 2) Even if we did this would be broken either way because just
> > where do you put the urgent pointer if your packet is > 64K and
> > the urgent pointer lies in the packet but beyond 64K to the start?
> 
> You would need to chop up the packet to allow proper specification of
> the URG pointer.

Right, and it'd be just as easy to chop up the packet so that we
can set the urgent pointer to 0xffff :)

> I agree that prompt signalling of the URG condition is desirable.
> 
> None of the RFCs seem to give any real guidance in this area, and that
> explains the plethora of ways we see various stacks handling this
> situation.
> 
> I'll think about your patch some more, but no promises.

This is definitely something that we don't want to rush into.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 21:30                       ` Herbert Xu
@ 2008-12-18  8:58                         ` Petr Tesarik
  2008-12-18  9:06                           ` Herbert Xu
  0 siblings, 1 reply; 47+ messages in thread
From: Petr Tesarik @ 2008-12-18  8:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Alexey Kuznetsov, Ilpo Järvinen, davem, netdev

Herbert Xu píše v Čt 18. 12. 2008 v 08:30 +1100:
> On Wed, Dec 17, 2008 at 05:04:48PM +0100, Petr Tesarik wrote:
> >
> > This is all nice, but it still does not solve those series of SIGURGs on
> > the receiving side. My suggestion is to not generate a new SIGURG until
> > the data for the latest one have arrived. There can be only one byte of
> > urgent data, so if somebody sends more than one, it cannot be handled by
> > the receiver, anyway, so that use case is broken and need not be taken
> > into account.
> 
> No we need to send a SIGURG as soon as we enter urgent mode.  The
> whole point of urgent mode is to inform the receiver ASAP.  As the
> urgent data itself cannot be expedited due to the in-order nature
> of TCP, setting the flag and sending the signal is the best we can
> do.

That doesn't contradict my suggestion, which is: "Send the SIGURG as
soon as possible, but do not send _more_ SIGURGs until the urgent data
arrives."  I'm going to send a patch to make it clear.

Petr Tesarik



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-18  8:58                         ` Petr Tesarik
@ 2008-12-18  9:06                           ` Herbert Xu
  0 siblings, 0 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-18  9:06 UTC (permalink / raw)
  To: Petr Tesarik; +Cc: Alexey Kuznetsov, Ilpo Järvinen, davem, netdev

On Thu, Dec 18, 2008 at 09:58:12AM +0100, Petr Tesarik wrote:
>
> That doesn't contradict my suggestion, which is: "Send the SIGURG as
> soon as possible, but do not send _more_ SIGURGs until the urgent data
> arrives."  I'm going to send a patch to make it clear.

OK that does make sense.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 22:52                       ` Herbert Xu
  2008-12-17 23:15                         ` Herbert Xu
@ 2008-12-18 19:45                         ` Ilpo Järvinen
  2008-12-26  1:11                           ` David Miller
  1 sibling, 1 reply; 47+ messages in thread
From: Ilpo Järvinen @ 2008-12-18 19:45 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Alexey Kuznetsov, Petr Tesarik, David Miller, Netdev

On Thu, 18 Dec 2008, Herbert Xu wrote:

> On Thu, Dec 18, 2008 at 08:29:07AM +1100, Herbert Xu wrote:
> >
> > > I.e. we set URG and urg_ptr to fake scb->seq + 0xFFFF only when:
> > > 
> > > 	after(scb->seq + 0xFFFF, tp->snd_nxt)
> > > 
> > > My brains are rusty, so take this critically. :-)
> > 
> > Indeed that's a great idea!
> 
> OK, I think this is what we should do:
> 
> 1) Fix fresh transmissions of urgent mode as Alexey suggested.
> 
> This is the safest way of ensuring that the urgent notification
> is not delayed.  This is still not as "fast" as the BSD behaviour
> but it is much safer with respect to broken^H^H^H^H^H^Hlegacy
> applications.
> 
> 2) Continue to use the current test for retransmissions.
> 
> 3) Apply my original GSO patch and revert the existing URG fixes.
> 
> With Alexey's fix, the scenario that Ilpo identified would only
> occur on retransmissions.  As we disable TSO on retransmissions,
> it no longer affects my patch.

No, would you use that 0xffff hack with gso you would create even worse 
problem in case of reordering / losses. And that's just because the very 
scenario still remains fully unsolved. I'm not sure if you understood my 
scenario at all (I was hopeful that you did :-))? It's basically that 
(with the terms of this hack) some segments in a super-skb would need to 
set 0xffff while some < 0xffff. ...But you just can't send too large urg 
ptr in any segment or you're asking for trouble.

Also, I think even without gso but with reordering / enough losses will 
make it possible that not-yet-sent condition of Alexey's proposal (it's 
techically not-yet-received) will no longer hold from _the receiver's_ 
point of view and havoc breaks loose.


-- 
 i.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 23:17                     ` David Miller
  2008-12-17 23:25                       ` Herbert Xu
@ 2008-12-18 20:07                       ` Ilpo Järvinen
  2008-12-19 12:31                         ` Petr Tesarik
  1 sibling, 1 reply; 47+ messages in thread
From: Ilpo Järvinen @ 2008-12-18 20:07 UTC (permalink / raw)
  To: David Miller; +Cc: Herbert Xu, Alexey Kuznetsov, ptesarik, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1853 bytes --]

On Wed, 17 Dec 2008, David Miller wrote:

> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Thu, 18 Dec 2008 09:16:37 +1100
> 
> And it also shows that our strange scheme, where we don't
> signal URG until it's within the 16-bit sequence offset
> limit, at least reports an accurate URG value.

[...snip...]

> Given all of this I still think our current compromise is likely the
> best one.  We never will advertise an URG pointer that is not pointing
> to where the URG data will be in the sequence space.

Wholeheartedly agreed, but with added note that if somebody is fool enough 
to come up complaining about the behavior we've selected (among the palette 
of broken ways we had available) we might actually succeed in convincing 
him to come up with something that has more change in working than urg 
altogether. Sadly, urg is ABI-in-stone that must everyone must keep 
supporting in some crippled-form. ...But no worries, I've a solution below 
which should satisfy everybody... :-)


-- 
 i.


[PATCH] tcp: fix all urg troubles for now

Return to stone age with 64k is enough for everybody attitude
(or was it 640k, I keep forgetting). Urg certainly works fine
if no window scaling is not allowed ;-)

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 99b7ecb..54fe815 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -72,7 +72,7 @@
 #include <net/netdma.h>
 
 int sysctl_tcp_timestamps __read_mostly = 1;
-int sysctl_tcp_window_scaling __read_mostly = 1;
+int sysctl_tcp_window_scaling __read_mostly = 0;
 int sysctl_tcp_sack __read_mostly = 1;
 int sysctl_tcp_fack __read_mostly = 1;
 int sysctl_tcp_reordering __read_mostly = TCP_FASTRETRANS_THRESH;
-- 
1.5.2.2

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-18 20:07                       ` Ilpo Järvinen
@ 2008-12-19 12:31                         ` Petr Tesarik
  0 siblings, 0 replies; 47+ messages in thread
From: Petr Tesarik @ 2008-12-19 12:31 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Herbert Xu, Alexey Kuznetsov, Netdev

Ilpo Järvinen píše v Čt 18. 12. 2008 v 22:07 +0200:
> On Wed, 17 Dec 2008, David Miller wrote:
> 
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Date: Thu, 18 Dec 2008 09:16:37 +1100
> > 
> > And it also shows that our strange scheme, where we don't
> > signal URG until it's within the 16-bit sequence offset
> > limit, at least reports an accurate URG value.
> 
> [...snip...]
> 
> > Given all of this I still think our current compromise is likely the
> > best one.  We never will advertise an URG pointer that is not pointing
> > to where the URG data will be in the sequence space.
> 
> Wholeheartedly agreed, but with added note that if somebody is fool enough 
> to come up complaining about the behavior we've selected (among the palette 
> of broken ways we had available) we might actually succeed in convincing 
> him to come up with something that has more change in working than urg 
> altogether. Sadly, urg is ABI-in-stone that must everyone must keep 
> supporting in some crippled-form. ...But no worries, I've a solution below 
> which should satisfy everybody... :-)
> 
> 
> -- 
>  i.
> 
> 
> [PATCH] tcp: fix all urg troubles for now
> 
> Return to stone age with 64k is enough for everybody attitude
> (or was it 640k, I keep forgetting). Urg certainly works fine
> if no window scaling is not allowed ;-)
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> ---
>  net/ipv4/tcp_input.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 99b7ecb..54fe815 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -72,7 +72,7 @@
>  #include <net/netdma.h>
>  
>  int sysctl_tcp_timestamps __read_mostly = 1;
> -int sysctl_tcp_window_scaling __read_mostly = 1;
> +int sysctl_tcp_window_scaling __read_mostly = 0;

Yes, nice default, but don't you want to
remove /proc/sys/net/ipv4/sysctl_tcp_window_scaling completely? Let's
not make it possible for the admin to shoot himself in the foot by
accident... ;-))

Petr Tesarik

>  int sysctl_tcp_sack __read_mostly = 1;
>  int sysctl_tcp_fack __read_mostly = 1;
>  int sysctl_tcp_reordering __read_mostly = TCP_FASTRETRANS_THRESH;b


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 22:16                   ` Herbert Xu
  2008-12-17 23:17                     ` David Miller
@ 2008-12-19 13:46                     ` Alexey Kuznetsov
  2008-12-19 19:59                       ` Herbert Xu
  1 sibling, 1 reply; 47+ messages in thread
From: Alexey Kuznetsov @ 2008-12-19 13:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Petr Tesarik, Ilpo J??rvinen, davem, netdev

Hello!

> Indeed they do have more quirks.  I only saw it after reading it
> again, but they always use snd_nxt instead of the current sequence
> number:

Their snd_nxt and our snd_nxt are different things.
Analog of our snd_nxt used to be called snd_max.

BSD snd_nxt points to current transmission head,
which is rewinded to snd_una when it starts to retransmit.


> 	if (SEQ_GT(tp->snd_up, tp->snd_nxt)) {
> 		u_int32_t urp = tp->snd_up - tp->snd_nxt;
> 		if (urp > IP_MAXPACKET)
> 			urp = IP_MAXPACKET;
> 		th->th_urp = htons((u_int16_t)urp);
> 		th->th_flags |= TH_URG;

Obviously, somewhere around this place they do th->th_seq = tp->snd_nxt.


BTW, that line of code which I sent was logically (not practically) wrong,
Dave spotted this instantly. :-) Logically correct way would be
to check against future snd_nxt which will be advanced
after the packet is submitted. Practically, it does not matter,
the difference is only for jumbo frames, which are inhibited
for tcp.

Alexey

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-19 13:46                     ` Alexey Kuznetsov
@ 2008-12-19 19:59                       ` Herbert Xu
  0 siblings, 0 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-19 19:59 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Petr Tesarik, Ilpo J??rvinen, davem, netdev

On Fri, Dec 19, 2008 at 04:46:44PM +0300, Alexey Kuznetsov wrote:
>
> BSD snd_nxt points to current transmission head,
> which is rewinded to snd_una when it starts to retransmit.

Right, but it would seem that it's broken for sack retransmits
at least.  This is from NetBSD but the others are similar:

	/*
	 * If we are doing retransmissions, then snd_nxt will
	 * not reflect the first unsent octet.  For ACK only
	 * packets, we do not want the sequence number of the
	 * retransmitted packet, we want the sequence number
	 * of the next unsent octet.  So, if there is no data
	 * (and no SYN or FIN), use snd_max instead of snd_nxt
	 * when filling in ti_seq.  But if we are in persist
	 * state, snd_max might reflect one byte beyond the
	 * right edge of the window, so use snd_nxt in that
	 * case, since we know we aren't doing a retransmission.
	 * (retransmit and persist are mutually exclusive...)
	 */
	if (TCP_SACK_ENABLED(tp) && sack_rxmit) {
		th->th_seq = htonl(p->rxmit);
		p->rxmit += len;
	} else {
		if (len || (flags & (TH_SYN|TH_FIN)) ||
		    TCP_TIMER_ISARMED(tp, TCPT_PERSIST))
			th->th_seq = htonl(tp->snd_nxt);
		else
			th->th_seq = htonl(tp->snd_max);
	}
 
> BTW, that line of code which I sent was logically (not practically) wrong,
> Dave spotted this instantly. :-) Logically correct way would be
> to check against future snd_nxt which will be advanced
> after the packet is submitted. Practically, it does not matter,
> the difference is only for jumbo frames, which are inhibited
> for tcp.

Actually I think the unadvanced snd_nxt makes sense too.  What we're
asking there is has the urgent data been transmitted before, i.e.,
is the current skb a retransmit (since we've already established
that the urgent pointer is forward of the current packet).

Yes this would be broken if we ever do jumbo frames, but it'd
be broken for the same reason that the current test'd be brokens,
that is, we need to split the packet in urgent mode.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-18 19:45                         ` Ilpo Järvinen
@ 2008-12-26  1:11                           ` David Miller
  2008-12-26  1:47                             ` Herbert Xu
  0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2008-12-26  1:11 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: herbert, kuznet, ptesarik, netdev

From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 18 Dec 2008 21:45:13 +0200 (EET)

> No, would you use that 0xffff hack with gso you would create even worse 
> problem in case of reordering / losses. And that's just because the very 
> scenario still remains fully unsolved. I'm not sure if you understood my 
> scenario at all (I was hopeful that you did :-))? It's basically that 
> (with the terms of this hack) some segments in a super-skb would need to 
> set 0xffff while some < 0xffff. ...But you just can't send too large urg 
> ptr in any segment or you're asking for trouble.

Hmmm, since URG is relative it means that GSO (or in-hardware TSO)
would need to adjust the URG offset for each segment.

But if we stick this 0xffff thing there, it _can't_ do that properly
because it cannot know where the correct URG pointer actually is.

In fact, the straightforward URG offset adjust one would expect a TSO
engine to make would corrupt the URG pointer when we put this
"infinity" value there.

The URG pointer might be >= 0xffff from the initial sequence number
of the starting TSO segment, but the URG offset might be in range
of one of the non-initial MSS segmented frames.

Anyways, we skip TSO/GSO for any time URG is indicated making all of
this particular concern moot. ;-)  But if at some point we want to
allow it, we'd need to consider this problem.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-17 23:15                         ` Herbert Xu
  2008-12-17 23:21                           ` David Miller
@ 2008-12-26  1:13                           ` David Miller
  1 sibling, 0 replies; 47+ messages in thread
From: David Miller @ 2008-12-26  1:13 UTC (permalink / raw)
  To: herbert; +Cc: kuznet, ptesarik, ilpo.jarvinen, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 18 Dec 2008 10:15:43 +1100

> tcp: Always set urgent pointer if it's beyond snd_nxt

Ok, I've applied this for 2.6.29

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-26  1:11                           ` David Miller
@ 2008-12-26  1:47                             ` Herbert Xu
  2008-12-26  2:00                               ` David Miller
  0 siblings, 1 reply; 47+ messages in thread
From: Herbert Xu @ 2008-12-26  1:47 UTC (permalink / raw)
  To: David Miller; +Cc: ilpo.jarvinen, kuznet, ptesarik, netdev

On Thu, Dec 25, 2008 at 05:11:46PM -0800, David Miller wrote:
>
> Hmmm, since URG is relative it means that GSO (or in-hardware TSO)
> would need to adjust the URG offset for each segment.
> 
> But if we stick this 0xffff thing there, it _can't_ do that properly
> because it cannot know where the correct URG pointer actually is.
> 
> In fact, the straightforward URG offset adjust one would expect a TSO
> engine to make would corrupt the URG pointer when we put this
> "infinity" value there.

With Alexey's proposal 0xffff is not infinity, what it is is a
position that we have not yet transmited.  The point is that due
to the way urgent mode works, the remote side will not exit urgent
mode or elide the data until it sees the actual data pointed to
by the urgent pointer.

When we do finally transmit the data pointed to by 0xffff, it'll
be accompanied with a new urgent pointer either pointing to the real
urgent data or 0xffff (which is again untransmitted).

> The URG pointer might be >= 0xffff from the initial sequence number
> of the starting TSO segment, but the URG offset might be in range
> of one of the non-initial MSS segmented frames.

It doesn't really matter if it's in range or not, the point is that
whether we point to the 0xffff point or the real urgent data, the
data at that sequence number is yet to be transmitted.  So by the
time the remote peer actually gets to that point, it would have
received an updated urgent pointer.

If you're talking about the case where we didn't set the urgent
pointer on the TSO packet, but we should set it on the final
segment generated from it, then that can't happen because we
don't do TSO on retransmissions.

Here's an example, we have a TSO packet with 1440 * 45 = 64800 bytes
of data, the MSS is 1440, let the initial sequence number be zero
Let's assume that the urgent data is at 66240.

So the TCP stack will send the TSO packet with urgent pointer set
to 65535.  The first segmented TCP packet will also have 65535 as
the urgent pointer, the one after it will have 65535 - 1440 = 64095,
etc.  The final one will have 65535 - 1440 * 44 = 2175.

The TCP stack then transmits the next packet (let's assume it's 1440
bytes long for the sake of simplicity), which has the urgent pointer
set to 66240 - 64800 = 1440.

If all these packets are delivered in order, then what'll happen
is that RCV.UP starts out at 65535, and once the final segment is
delivered it'll be updated to 66240.

If out-of-order delivery occurs and the final segment arrives earlier,
then all that'll happen is that RCV.UP will hit 66240 earlier, which
is not an issue since all subsequent urgent pointers will be less and
hence ignored.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-26  1:47                             ` Herbert Xu
@ 2008-12-26  2:00                               ` David Miller
  2008-12-26  2:38                                 ` Herbert Xu
  0 siblings, 1 reply; 47+ messages in thread
From: David Miller @ 2008-12-26  2:00 UTC (permalink / raw)
  To: herbert; +Cc: ilpo.jarvinen, kuznet, ptesarik, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 26 Dec 2008 12:47:39 +1100

> If all these packets are delivered in order, then what'll happen
> is that RCV.UP starts out at 65535, and once the final segment is
> delivered it'll be updated to 66240.
> 
> If out-of-order delivery occurs and the final segment arrives earlier,
> then all that'll happen is that RCV.UP will hit 66240 earlier, which
> is not an issue since all subsequent urgent pointers will be less and
> hence ignored.

So it all hinges on how the receiver updates his URG pointer.  This is
what legitimizes a sender advertising a changing URG pointer.

So it works, but it isn't the nicest thing in the world :-)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH] tcp: make urg+gso work for real this time
  2008-12-26  2:00                               ` David Miller
@ 2008-12-26  2:38                                 ` Herbert Xu
  0 siblings, 0 replies; 47+ messages in thread
From: Herbert Xu @ 2008-12-26  2:38 UTC (permalink / raw)
  To: David Miller; +Cc: ilpo.jarvinen, kuznet, ptesarik, netdev

On Thu, Dec 25, 2008 at 06:00:51PM -0800, David Miller wrote:
> 
> So it all hinges on how the receiver updates his URG pointer.  This is
> what legitimizes a sender advertising a changing URG pointer.

Pretty much.

> So it works, but it isn't the nicest thing in the world :-)

It would've been nice and simple had the original implementors
strictly followed RFC793 and done the out-of-band notifications
instead of coming up with their fanciful notions of out-of-band
data :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2008-12-26  2:38 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-03 11:22 [PATCH] tcp: make urg+gso work for real this time Ilpo Järvinen
2008-12-03 14:37 ` Petr Tesarik
2008-12-03 18:31   ` Ilpo Järvinen
2008-12-04  5:25     ` David Miller
2008-12-17  1:18 ` Herbert Xu
2008-12-17  8:15   ` Ilpo Järvinen
2008-12-17 10:13     ` Herbert Xu
2008-12-17 10:31       ` Herbert Xu
2008-12-17 10:42         ` Herbert Xu
2008-12-17 10:52           ` Petr Tesarik
2008-12-17 11:26             ` Herbert Xu
2008-12-17 11:41               ` Herbert Xu
2008-12-17 12:55                 ` Alexey Kuznetsov
2008-12-17 13:31                   ` Petr Tesarik
2008-12-17 14:21                     ` Alexey Kuznetsov
2008-12-17 15:30                   ` Alexey Kuznetsov
2008-12-17 16:04                     ` Petr Tesarik
2008-12-17 21:30                       ` Herbert Xu
2008-12-18  8:58                         ` Petr Tesarik
2008-12-18  9:06                           ` Herbert Xu
2008-12-17 21:29                     ` Herbert Xu
2008-12-17 22:52                       ` Herbert Xu
2008-12-17 23:15                         ` Herbert Xu
2008-12-17 23:21                           ` David Miller
2008-12-17 23:31                             ` Herbert Xu
2008-12-26  1:13                           ` David Miller
2008-12-18 19:45                         ` Ilpo Järvinen
2008-12-26  1:11                           ` David Miller
2008-12-26  1:47                             ` Herbert Xu
2008-12-26  2:00                               ` David Miller
2008-12-26  2:38                                 ` Herbert Xu
2008-12-17 21:27                   ` Herbert Xu
2008-12-17 22:16                   ` Herbert Xu
2008-12-17 23:17                     ` David Miller
2008-12-17 23:25                       ` Herbert Xu
2008-12-17 23:34                         ` David Miller
2008-12-17 23:41                           ` Herbert Xu
2008-12-18 20:07                       ` Ilpo Järvinen
2008-12-19 12:31                         ` Petr Tesarik
2008-12-19 13:46                     ` Alexey Kuznetsov
2008-12-19 19:59                       ` Herbert Xu
2008-12-17 23:23               ` David Miller
2008-12-17  8:18   ` Petr Tesarik
2008-12-17  8:42     ` Ilpo Järvinen
2008-12-17 10:17     ` Herbert Xu
2008-12-17 10:45       ` Petr Tesarik
2008-12-17 10:47         ` Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.