linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: question about linux tcp request queue handling
       [not found]         ` <001401c344df$ccbc63c0$6801a8c0@oemcomputer.suse.lists.linux.kernel>
@ 2003-07-07 21:48           ` Andi Kleen
  2003-07-07 22:25             ` Doug McNaught
                               ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Andi Kleen @ 2003-07-07 21:48 UTC (permalink / raw)
  To: Paul Albrecht; +Cc: niv, linux-kernel, netdev

"Paul Albrecht" <palbrecht@qwest.net> writes:

> This statement is inconsistent with the description of this scenario in
> Steven's TCP/IP Illustrated.  Specifically, continuing the handshake in the
> TCP layer, i.e., sending a syn/ack and moving to the syn_recd state, is
> incorrect if the limit of the server's socket backlog would be exceeded.
> How do you account for this discrepancy between linux and other
> berkeley-derived implementations?

The 4.4BSD-Lite code described in Stevens is long outdated. All modern
BSDs (and probably most other Unixes too) do it in a similar way to what 
Nivedita described. The keywords are "syn flood attack" and "DoS". 

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07 21:48           ` question about linux tcp request queue handling Andi Kleen
@ 2003-07-07 22:25             ` Doug McNaught
  2003-07-07 23:52               ` Andi Kleen
  2003-07-08  4:14             ` Paul Albrecht
  2003-07-08 19:23             ` Paul Albrecht
  2 siblings, 1 reply; 16+ messages in thread
From: Doug McNaught @ 2003-07-07 22:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Paul Albrecht, niv, linux-kernel, netdev

Andi Kleen <ak@suse.de> writes:

> "Paul Albrecht" <palbrecht@qwest.net> writes:
> 
> > This statement is inconsistent with the description of this scenario in
> > Steven's TCP/IP Illustrated.  Specifically, continuing the handshake in the
> > TCP layer, i.e., sending a syn/ack and moving to the syn_recd state, is
> > incorrect if the limit of the server's socket backlog would be exceeded.
> > How do you account for this discrepancy between linux and other
> > berkeley-derived implementations?
> 
> The 4.4BSD-Lite code described in Stevens is long outdated. All modern
> BSDs (and probably most other Unixes too) do it in a similar way to what 
> Nivedita described. The keywords are "syn flood attack" and "DoS". 

And furthermore, IIRC, the current Linux networking code is not
Berkeley-derived, though an earlier version was.

-Doug

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07 22:25             ` Doug McNaught
@ 2003-07-07 23:52               ` Andi Kleen
  2003-07-08  0:17                 ` Doug McNaught
  0 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2003-07-07 23:52 UTC (permalink / raw)
  To: Doug McNaught; +Cc: palbrecht, niv, linux-kernel, netdev

On 07 Jul 2003 18:25:17 -0400
Doug McNaught <doug@mcnaught.org> wrote:

> And furthermore, IIRC, the current Linux networking code is not
> Berkeley-derived, though an earlier version was.

The linux network stack was never BSD derived in any way.

[there are two header files that came from net2, but they do not 
contain any code]

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07 23:52               ` Andi Kleen
@ 2003-07-08  0:17                 ` Doug McNaught
  2003-07-08  0:25                   ` Andi Kleen
  2003-07-08 14:09                   ` Horst von Brand
  0 siblings, 2 replies; 16+ messages in thread
From: Doug McNaught @ 2003-07-08  0:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: palbrecht, niv, linux-kernel, netdev

Andi Kleen <ak@suse.de> writes:

> On 07 Jul 2003 18:25:17 -0400
> Doug McNaught <doug@mcnaught.org> wrote:
> 
> > And furthermore, IIRC, the current Linux networking code is not
> > Berkeley-derived, though an earlier version was.
> 
> The linux network stack was never BSD derived in any way.
> 
> [there are two header files that came from net2, but they do not 
> contain any code]

OIDNRC, thanks for the correction.   :)

Although, I distinctly remember seeing "Net-2" in one of the boot
mesages in an early kernel (pre 1.0); was that just the header files'
doing?

-Doug

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-08  0:17                 ` Doug McNaught
@ 2003-07-08  0:25                   ` Andi Kleen
  2003-07-08 14:09                   ` Horst von Brand
  1 sibling, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2003-07-08  0:25 UTC (permalink / raw)
  To: Doug McNaught; +Cc: palbrecht, niv, linux-kernel, netdev

On 07 Jul 2003 20:17:57 -0400
Doug McNaught <doug@mcnaught.org> wrote:


> Although, I distinctly remember seeing "Net-2" in one of the boot
> mesages in an early kernel (pre 1.0); was that just the header files'
> doing?

Net-2 was the name for a linux network code release too. The current code is net4
(actually more net5). But it has nothing to do with the similarly named
BSD release. 

-Andi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07 21:48           ` question about linux tcp request queue handling Andi Kleen
  2003-07-07 22:25             ` Doug McNaught
@ 2003-07-08  4:14             ` Paul Albrecht
  2003-07-08 19:23             ` Paul Albrecht
  2 siblings, 0 replies; 16+ messages in thread
From: Paul Albrecht @ 2003-07-08  4:14 UTC (permalink / raw)
  To: Andi Kleen; +Cc: niv, linux-kernel, netdev

Andi Kleen writes:

>
> The 4.4BSD-Lite code described in Stevens is long outdated.
>

I was referring to volume one subtitled: "The Protocols."  It doesn't
describe implementation and the examples are not limited to bsd-lite.

>
>All modern BSDs (and probably most other Unixes too) do it in a similar way
to what
> Nivedita described.
>

Linux doesn't operate in the manner  Nivedita describes ... the tcp layer on
the server side moves to the syn_recd state, but doesn't accept the ack back
from client. Instead it times out and sends its syn/ack back to the client
and again ignores the client's ack, ... Eventually, either there's room on
backlog queue and the server side moves to the established state or the
server side stops resending the its syn/ack.  This doesn't seem to make much
sense. If the tcp layer can send the syn/ack it seems like it should
probably respond to the client's ack.

>
>The keywords are "syn flood attack" and "DoS".
>

I'd be interested in a more specific reference detailing the changes
required to the listen syscall as a consequence of the changes required for
avoidance of syn flood attacks.  Thanks.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-08  0:17                 ` Doug McNaught
  2003-07-08  0:25                   ` Andi Kleen
@ 2003-07-08 14:09                   ` Horst von Brand
  1 sibling, 0 replies; 16+ messages in thread
From: Horst von Brand @ 2003-07-08 14:09 UTC (permalink / raw)
  To: Doug McNaught; +Cc: Linux Kernel Mailing List

Doug McNaught <doug@mcnaught.org> said:

[...]

> Although, I distinctly remember seeing "Net-2" in one of the boot
> mesages in an early kernel (pre 1.0); was that just the header files'
> doing?

Nope. There were NET, NET2, NET3, ... versions of the Linux native TCP/IP
stack. Just name coincidence.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07 21:48           ` question about linux tcp request queue handling Andi Kleen
  2003-07-07 22:25             ` Doug McNaught
  2003-07-08  4:14             ` Paul Albrecht
@ 2003-07-08 19:23             ` Paul Albrecht
  2 siblings, 0 replies; 16+ messages in thread
From: Paul Albrecht @ 2003-07-08 19:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: niv, linux-kernel, netdev

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

Andi Kleen writes:

>
> The 4.4BSD-Lite code described in Stevens is long outdated. All modern
> BSDs (and probably most other Unixes too) do it in a similar way to what
> Nivedita described. The keywords are "syn flood attack" and "DoS".
>

I have attached a copy of tcpdump output for two linux systems connected
over ether replaying the scenario for incoming request queue handling given
in Stevens's TCP/IP Illustrated Volume 1: The Protocols.  What I don't
understand about the third handshake is if the server is going to send the
syn/ack in response the client's initial syn then why does server repeatly
ignore the subsequent ack from the client?

[-- Attachment #2: trace.txt --]
[-- Type: text/plain, Size: 2723 bytes --]

01:12:09.622208 client.acme.net.1024 > server.acme.net.7777: S 2730884988:2730884988(0) win 5840 <mss 1460,sackOK,timestamp 133507 0,nop,wscale 0> (DF)
01:12:09.623457 server.acme.net.7777 > client.acme.net.1024: S 1682786145:1682786145(0) ack 2730884989 win 5792 <mss 1460,sackOK,timestamp 42960 133507,nop,wscale 0> (DF)
01:12:09.623963 client.acme.net.1024 > server.acme.net.7777: . ack 1682786146 win 5840 <nop,nop,timestamp 133508 42960> (DF)
01:12:11.858191 client.acme.net.1025 > server.acme.net.7777: S 2743503110:2743503110(0) win 5840 <mss 1460,sackOK,timestamp 134652 0,nop,wscale 0> (DF)
01:12:11.858991 server.acme.net.7777 > client.acme.net.1025: S 1690738882:1690738882(0) ack 2743503111 win 5792 <mss 1460,sackOK,timestamp 43183 134652,nop,wscale 0> (DF)
01:12:11.859535 client.acme.net.1025 > server.acme.net.7777: . ack 1690738883 win 5840 <nop,nop,timestamp 134653 43183> (DF)
01:12:13.909895 client.acme.net.1026 > server.acme.net.7777: S 2736891141:2736891141(0) win 5840 <mss 1460,sackOK,timestamp 135702 0,nop,wscale 0> (DF)
01:12:13.910636 server.acme.net.7777 > client.acme.net.1026: S 1692403887:1692403887(0) ack 2736891142 win 5792 <mss 1460,sackOK,timestamp 43388 135702,nop,wscale 0> (DF)
01:12:13.911144 client.acme.net.1026 > server.acme.net.7777: . ack 1692403888 win 5840 <nop,nop,timestamp 135703 43388> (DF)
01:12:17.502319 server.acme.net.7777 > client.acme.net.1026: S 1692403887:1692403887(0) ack 2736891142 win 5792 <mss 1460,sackOK,timestamp 43748 135703,nop,wscale 0> (DF)
01:12:17.502909 client.acme.net.1026 > server.acme.net.7777: . ack 1692403888 win 5840 <nop,nop,timestamp 137542 43748,nop,nop,sack sack 1 {1692403887:1692403888} > (DF)
01:12:23.502350 server.acme.net.7777 > client.acme.net.1026: S 1692403887:1692403887(0) ack 2736891142 win 5792 <mss 1460,sackOK,timestamp 44348 137542,nop,wscale 0> (DF)
01:12:23.502969 client.acme.net.1026 > server.acme.net.7777: . ack 1692403888 win 5840 <nop,nop,timestamp 140614 44348,nop,nop,sack sack 1 {1692403887:1692403888} > (DF)
01:12:35.702302 server.acme.net.7777 > client.acme.net.1026: S 1692403887:1692403887(0) ack 2736891142 win 5792 <mss 1460,sackOK,timestamp 45568 140614,nop,wscale 0> (DF)
01:12:35.702840 client.acme.net.1026 > server.acme.net.7777: . ack 1692403888 win 5840 <nop,nop,timestamp 146859 45568,nop,nop,sack sack 1 {1692403887:1692403888} > (DF)
01:12:59.702343 server.acme.net.7777 > client.acme.net.1026: S 1692403887:1692403887(0) ack 2736891142 win 5792 <mss 1460,sackOK,timestamp 47968 146859,nop,wscale 0> (DF)
01:12:59.702994 client.acme.net.1026 > server.acme.net.7777: . ack 1692403888 win 5840 <nop,nop,timestamp 159147 47968,nop,nop,sack sack 1 {1692403887:1692403888} > (DF)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07  5:51       ` Nivedita Singhvi
  2003-07-07  5:59         ` Nivedita Singhvi
@ 2003-07-07 23:30         ` Paul Albrecht
  1 sibling, 0 replies; 16+ messages in thread
From: Paul Albrecht @ 2003-07-07 23:30 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: linux-kernel, netdev

Nivedita Singhvi writes:

>
> Again, youre limiting the number of connnection requests
> that are allowed to wait in the *accept* queue, where
> we move to once we're ESTABLISHED.  You arent limiting
> a request sitting in the SYN queue.
>

This statement is inconsistent with the description of this scenario in
Steven's TCP/IP Illustrated.  Specifically, continuing the handshake in the
TCP layer, i.e., sending a syn/ack and moving to the syn_recd state, is
incorrect if the limit of the server's socket backlog would be exceeded.
How do you account for this discrepancy between linux and other
berkeley-derived implementations?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-06 23:59   ` Nivedita Singhvi
@ 2003-07-07  6:20     ` Paul Albrecht
  2003-07-07  5:51       ` Nivedita Singhvi
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Albrecht @ 2003-07-07  6:20 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: linux-kernel, netdev

Nivedita Singhvi writes:

>
> When you set a the backlog to 1 in the listen call, what is
> being capped is the accept queue. So I would expect your
> server to allow only one of those requests in the accept
> queue, and the kernel will drop the other two requests.
>

What you get when you set backlog to one is operating system dependent.
Tracing the flows with tcpdump, I get two clean handshakes so presumeably,
for linux, one means two.  The third connection request *isn't* dropped;
according to netstat, it's placed in the syn_recd state.  I thought
berkeley-derived implementations followed the rule that if there is no room
on the backlog queue for the new connection, tcp ignored the the received
syn.

>
> Actually, details, but we also apply some other conditions
> before we actually drop the connection request - we try not to be
> so harsh if the syn queue is still fairly empty..
>

Irrespective of whatever conditions linux applies, how can the connection
enter the syn_recd state if the backlog limit would be exceeded?  What's the
client supposed to do with the syn/ack from the server? What's the server
supposed to do with the ack it get's back from the client?



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07  5:51       ` Nivedita Singhvi
@ 2003-07-07  5:59         ` Nivedita Singhvi
  2003-07-07 23:30         ` Paul Albrecht
  1 sibling, 0 replies; 16+ messages in thread
From: Nivedita Singhvi @ 2003-07-07  5:59 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: Paul Albrecht, linux-kernel, netdev

Nivedita Singhvi wrote:

> Er, complete the 3 way handshake? If the client gets the syn/ack, it
> should send a SYN in response, and move to ESTABLISHED state. If the
		~~~

my bad, sorry, that should be ACK, of course...

thanks,
Nivedita


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07  6:20     ` Paul Albrecht
@ 2003-07-07  5:51       ` Nivedita Singhvi
  2003-07-07  5:59         ` Nivedita Singhvi
  2003-07-07 23:30         ` Paul Albrecht
  0 siblings, 2 replies; 16+ messages in thread
From: Nivedita Singhvi @ 2003-07-07  5:51 UTC (permalink / raw)
  To: Paul Albrecht; +Cc: linux-kernel, netdev

Paul Albrecht wrote:

>>When you set a the backlog to 1 in the listen call, what is
>>being capped is the accept queue. So I would expect your
>>server to allow only one of those requests in the accept
>>queue, and the kernel will drop the other two requests.

> What you get when you set backlog to one is operating system dependent.

You asked about Linux 2.4.18, and I was speaking
strictly for it. This is after all linux-netdev :).

> Tracing the flows with tcpdump, I get two clean handshakes so presumeably,
> for linux, one means two.  The third connection request *isn't* dropped;

Again, youre limiting the number of connnection requests
that are allowed to wait in the *accept* queue, where
we move to once we're ESTABLISHED.  You arent limiting
a request sitting in the SYN queue.

> according to netstat, it's placed in the syn_recd state.  I thought
> berkeley-derived implementations followed the rule that if there is no room
> on the backlog queue for the new connection, tcp ignored the the received
> syn.

>>Actually, details, but we also apply some other conditions
>>before we actually drop the connection request - we try not to be
>>so harsh if the syn queue is still fairly empty..
>>
> 
> 
> Irrespective of whatever conditions linux applies, how can the connection
> enter the syn_recd state if the backlog limit would be exceeded?  What's the
> client supposed to do with the syn/ack from the server? What's the server
> supposed to do with the ack it get's back from the client?

Er, complete the 3 way handshake? If the client gets the syn/ack, it
should send a SYN in response, and move to ESTABLISHED state. If the
server gets an ack back from the client, we process the ack. Our
processing involves moving the request from the syn queue to the
accept queue. Should the accept queue be full (which could occur
anytime - eg it could have occurred *after* the server recvd this
SYN) we would drop the request.  Should the client then send data,
it would get a RST, letting it know our side (srvr) has had to
throw the connection away.  Its quite possible that the accept queue
clears and a request can be moved from the SYN queue to the
accept queue in the interval of the handshake being completed (?)

If we get a SYN, it doesn't seem unreasonable that we enter
SYN_RCVD state :).

thanks,
Nivedita







^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-06 20:24 Nivedita Singhvi
@ 2003-07-07  0:12 ` Paul Albrecht
  2003-07-06 23:59   ` Nivedita Singhvi
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Albrecht @ 2003-07-07  0:12 UTC (permalink / raw)
  To: Nivedita Singhvi; +Cc: linux-kernel, netdev

Nivedita writes:

>
> Do you have tcp_syncookies on?
>

syncookies = 0.

>
>And are you exceeding the len as configured by tcp_max_syn_backlog?
>

max_syn_backlog = 256.

My server program sets its backlog to one and pauses ninety seconds before
accepting connections.  Within that ninety second interval, I start three
client programs that do an active open to my server.  I expect one of
connections to get discarded when the server's connection backlog limit is
exceeded.





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
  2003-07-07  0:12 ` Paul Albrecht
@ 2003-07-06 23:59   ` Nivedita Singhvi
  2003-07-07  6:20     ` Paul Albrecht
  0 siblings, 1 reply; 16+ messages in thread
From: Nivedita Singhvi @ 2003-07-06 23:59 UTC (permalink / raw)
  To: Paul Albrecht; +Cc: linux-kernel, netdev

Paul Albrecht wrote:

> My server program sets its backlog to one and pauses ninety seconds before
> accepting connections.  Within that ninety second interval, I start three
> client programs that do an active open to my server.  I expect one of
> connections to get discarded when the server's connection backlog limit is
> exceeded.

We actually have two queues - the syn queue and the socket
acccept queue. We move the connection request from the syn
queue to the accept queue of the socket once the 3 way
handshake is complete - i.e. once the state is ESTABLISHED.

If the syn queue is full, requests will get dropped and
the socket will not change state.

When you set a the backlog to 1 in the listen call, what is
being capped is the accept queue. So I would expect your
server to allow only one of those requests in the accept
queue, and the kernel will drop the other two requests.

Actually, details, but we also apply some other conditions
before we actually drop the connection request - we try not to be
so harsh if the syn queue is still fairly empty..

Think thats so, at any rate :).

Nivedita





^ permalink raw reply	[flat|nested] 16+ messages in thread

* question about linux tcp request queue handling
@ 2003-07-06 21:19 Paul Albrecht
  0 siblings, 0 replies; 16+ messages in thread
From: Paul Albrecht @ 2003-07-06 21:19 UTC (permalink / raw)
  To: linux-kernel

Linux (2.4.18) places incoming connection requests into the syn_recd state
when the server's backlog queue is full.  I thought they were supposed to be
discarded if the server's backlog is full, forcing the client to
subsequently retransmit the request after it times out.  Why does linux put
the server side into the syn_recd state when its backlog is full?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: question about linux tcp request queue handling
@ 2003-07-06 20:24 Nivedita Singhvi
  2003-07-07  0:12 ` Paul Albrecht
  0 siblings, 1 reply; 16+ messages in thread
From: Nivedita Singhvi @ 2003-07-06 20:24 UTC (permalink / raw)
  To: palbrecht; +Cc: linux-kernel, netdev


> Linux (2.4.18) places incoming connection requests into the syn_recd state
> when the server's backlog queue is full.  I thought they were supposed to be
> discarded if the server's backlog is full, forcing the client to
> subsequently retransmit the request after it times out.  Why does linux put
> the server side into the syn_recd state when its backlog is full?

Do you have tcp_syncookies on? And are you exceeding
the len as configured by tcp_max_syn_backlog?

thanks,
Nivedita

[Please cc or post to netdev, like most networking folk,
  dont subscribe to lkml]



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2003-07-08 17:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3F08858E.8000907@us.ibm.com.suse.lists.linux.kernel>
     [not found] ` <001a01c3441c$6fe111a0$6801a8c0@oemcomputer.suse.lists.linux.kernel>
     [not found]   ` <3F08B7E2.7040208@us.ibm.com.suse.lists.linux.kernel>
     [not found]     ` <000d01c3444f$e6439600$6801a8c0@oemcomputer.suse.lists.linux.kernel>
     [not found]       ` <3F090A4F.10004@us.ibm.com.suse.lists.linux.kernel>
     [not found]         ` <001401c344df$ccbc63c0$6801a8c0@oemcomputer.suse.lists.linux.kernel>
2003-07-07 21:48           ` question about linux tcp request queue handling Andi Kleen
2003-07-07 22:25             ` Doug McNaught
2003-07-07 23:52               ` Andi Kleen
2003-07-08  0:17                 ` Doug McNaught
2003-07-08  0:25                   ` Andi Kleen
2003-07-08 14:09                   ` Horst von Brand
2003-07-08  4:14             ` Paul Albrecht
2003-07-08 19:23             ` Paul Albrecht
2003-07-06 21:19 Paul Albrecht
  -- strict thread matches above, loose matches on Subject: below --
2003-07-06 20:24 Nivedita Singhvi
2003-07-07  0:12 ` Paul Albrecht
2003-07-06 23:59   ` Nivedita Singhvi
2003-07-07  6:20     ` Paul Albrecht
2003-07-07  5:51       ` Nivedita Singhvi
2003-07-07  5:59         ` Nivedita Singhvi
2003-07-07 23:30         ` Paul Albrecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).