All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP_DEFER_ACCEPT brokenness?
@ 2006-12-31  2:50 dean gaudet
  2007-01-30  9:05 ` dean gaudet
  0 siblings, 1 reply; 3+ messages in thread
From: dean gaudet @ 2006-12-31  2:50 UTC (permalink / raw)
  To: netdev; +Cc: mtk-manpages

hi... i'm having troubles matching up the tcp(7) man page description of 
TCP_DEFER_ACCEPT versus some comments in the kernel (2.6.20-rc2) versus 
how the kernel actually acts.

the man page says this:

   TCP_DEFER_ACCEPT
        Allows a listener to be awakened only when data arrives on
        the socket.  Takes an integer value (seconds), this can bound
        the maximum number of attempts TCP will make to complete the
        connection.  This option should not be used in code intended to
        be portable.

which is a bit confusing because it talks both about seconds and
"attempts".  (and doesn't mention what happens when the timeout finishes
-- i could see dropping the socket or passing it to userland anyhow as
possibilities... but in fact the socket is dropped).

the setsockopt code in tcp.c does this:

        case TCP_DEFER_ACCEPT:
                icsk->icsk_accept_queue.rskq_defer_accept = 0;
                if (val > 0) {
                        /* Translate value in seconds to number of
                         * retransmits */
                        while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
                               val > ((TCP_TIMEOUT_INIT / HZ) <<
                                       icsk->icsk_accept_queue.rskq_defer_accept))
                                icsk->icsk_accept_queue.rskq_defer_accept++;
                        icsk->icsk_accept_queue.rskq_defer_accept++;
                }
                break;

so at least the comment agrees with the man page -- however the code
doesn't... the code finds the least n such that val < (3<<n)...  but these
are timeouts and they're cumulative -- it would be more appropriate to
search for least n such that

        val < (3<<0) + (3<<1) + (3<<2) + ... + (3<<n)

but that's not all that's wrong... i'm not sure why, for val == 1 it
computes n=0 correctly (verified with getsockopt) but then it defers
way more timeouts than 2.  here's a tcpdump example where the timeout
was set to 1:

1167532741.446027 IP 127.0.0.1.56733 > 127.0.0.1.53846: S 1792609127:1792609127(0) win 32792 <mss 16396,sackOK,timestamp 249615 0,nop,wscale 5>
1167532741.446899 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 249616 249615,nop,wscale 5>
1167532741.446122 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 249616 249616>
1167532745.249902 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 250566 249616,nop,wscale 5>
1167532745.249912 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 250566 250566,nop,nop,sack 1 {0:1}>
1167532751.648046 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 252166 250566,nop,wscale 5>
1167532751.648058 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 252166 252166,nop,nop,sack 1 {0:1}>
1167532764.448456 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 255366 252166,nop,wscale 5>
1167532764.448473 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 255366 255366,nop,nop,sack 1 {0:1}>
1167532788.452409 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 261366 255366,nop,wscale 5>
1167532788.452430 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 261366 261366,nop,nop,sack 1 {0:1}>
1167532836.453520 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 273366 261366,nop,wscale 5>
1167532836.453539 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 273366 273366,nop,nop,sack 1 {0:1}>


now honestly i don't mind if 1s works correctly (because
apache 2.2.x is broken and sets TCP_DEFER_ACCEPT to 1 ... see
<http://issues.apache.org/bugzilla/show_bug.cgi?id=41270>).

but even if i use more reasonable timeouts like 30s it doesn't
behave as expected based on the docs.

not sure which way this should be resolved -- or how long the code has 
been like this...  perhaps the current behaviour should just become the 
documented behaviour (whatever the current behaviour is :).

-dean

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: TCP_DEFER_ACCEPT brokenness?
  2006-12-31  2:50 TCP_DEFER_ACCEPT brokenness? dean gaudet
@ 2007-01-30  9:05 ` dean gaudet
  2007-01-30 22:28   ` Julian Anastasov
  0 siblings, 1 reply; 3+ messages in thread
From: dean gaudet @ 2007-01-30  9:05 UTC (permalink / raw)
  To: netdev; +Cc: mtk-manpages

ping.  i received no response on this one..

thanks
-dean

On Sat, 30 Dec 2006, dean gaudet wrote:

> hi... i'm having troubles matching up the tcp(7) man page description of 
> TCP_DEFER_ACCEPT versus some comments in the kernel (2.6.20-rc2) versus 
> how the kernel actually acts.
> 
> the man page says this:
> 
>    TCP_DEFER_ACCEPT
>         Allows a listener to be awakened only when data arrives on
>         the socket.  Takes an integer value (seconds), this can bound
>         the maximum number of attempts TCP will make to complete the
>         connection.  This option should not be used in code intended to
>         be portable.
> 
> which is a bit confusing because it talks both about seconds and
> "attempts".  (and doesn't mention what happens when the timeout finishes
> -- i could see dropping the socket or passing it to userland anyhow as
> possibilities... but in fact the socket is dropped).
> 
> the setsockopt code in tcp.c does this:
> 
>         case TCP_DEFER_ACCEPT:
>                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
>                 if (val > 0) {
>                         /* Translate value in seconds to number of
>                          * retransmits */
>                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
>                                val > ((TCP_TIMEOUT_INIT / HZ) <<
>                                        icsk->icsk_accept_queue.rskq_defer_accept))
>                                 icsk->icsk_accept_queue.rskq_defer_accept++;
>                         icsk->icsk_accept_queue.rskq_defer_accept++;
>                 }
>                 break;
> 
> so at least the comment agrees with the man page -- however the code
> doesn't... the code finds the least n such that val < (3<<n)...  but these
> are timeouts and they're cumulative -- it would be more appropriate to
> search for least n such that
> 
>         val < (3<<0) + (3<<1) + (3<<2) + ... + (3<<n)
> 
> but that's not all that's wrong... i'm not sure why, for val == 1 it
> computes n=0 correctly (verified with getsockopt) but then it defers
> way more timeouts than 2.  here's a tcpdump example where the timeout
> was set to 1:
> 
> 1167532741.446027 IP 127.0.0.1.56733 > 127.0.0.1.53846: S 1792609127:1792609127(0) win 32792 <mss 16396,sackOK,timestamp 249615 0,nop,wscale 5>
> 1167532741.446899 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 249616 249615,nop,wscale 5>
> 1167532741.446122 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 249616 249616>
> 1167532745.249902 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 250566 249616,nop,wscale 5>
> 1167532745.249912 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 250566 250566,nop,nop,sack 1 {0:1}>
> 1167532751.648046 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 252166 250566,nop,wscale 5>
> 1167532751.648058 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 252166 252166,nop,nop,sack 1 {0:1}>
> 1167532764.448456 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 255366 252166,nop,wscale 5>
> 1167532764.448473 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 255366 255366,nop,nop,sack 1 {0:1}>
> 1167532788.452409 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 261366 255366,nop,wscale 5>
> 1167532788.452430 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 261366 261366,nop,nop,sack 1 {0:1}>
> 1167532836.453520 IP 127.0.0.1.53846 > 127.0.0.1.56733: S 1785169552:1785169552(0) ack 1792609128 win 32768 <mss 16396,sackOK,timestamp 273366 261366,nop,wscale 5>
> 1167532836.453539 IP 127.0.0.1.56733 > 127.0.0.1.53846: . ack 1 win 1025 <nop,nop,timestamp 273366 273366,nop,nop,sack 1 {0:1}>
> 
> 
> now honestly i don't mind if 1s works correctly (because
> apache 2.2.x is broken and sets TCP_DEFER_ACCEPT to 1 ... see
> <http://issues.apache.org/bugzilla/show_bug.cgi?id=41270>).
> 
> but even if i use more reasonable timeouts like 30s it doesn't
> behave as expected based on the docs.
> 
> not sure which way this should be resolved -- or how long the code has 
> been like this...  perhaps the current behaviour should just become the 
> documented behaviour (whatever the current behaviour is :).
> 
> -dean
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: TCP_DEFER_ACCEPT brokenness?
  2007-01-30  9:05 ` dean gaudet
@ 2007-01-30 22:28   ` Julian Anastasov
  0 siblings, 0 replies; 3+ messages in thread
From: Julian Anastasov @ 2007-01-30 22:28 UTC (permalink / raw)
  To: dean gaudet; +Cc: netdev, mtk-manpages


	Hello,

On Tue, 30 Jan 2007, dean gaudet wrote:

> > which is a bit confusing because it talks both about seconds and
> > "attempts".  (and doesn't mention what happens when the timeout finishes
> > -- i could see dropping the socket or passing it to userland anyhow as
> > possibilities... but in fact the socket is dropped).

	My understanding about SYN-ACKs is:

- there is always one SYN+ACK and at least one retransmission (min 
3+6 secs period to accept ACK)

- TCP_SYNCNT (or tcp_synack_retries) define the number of retransmissions,
this is a minimum that TCP_DEFER_ACCEPT can not reduce (due to the
'req->retrans < thresh' check). It can only extend it after the
ACK is received.

- TCP_DEFER_ACCEPT defines seconds (total time) to wait for ACK
plus first data

	Hint: one option is that you can treat TCP_DEFER_ACCEPT as flag, 
set it to 1 and then tune TCP_SYNCNT to cover the max desired period to 
wait for data.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-01-30 22:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-31  2:50 TCP_DEFER_ACCEPT brokenness? dean gaudet
2007-01-30  9:05 ` dean gaudet
2007-01-30 22:28   ` Julian Anastasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.