linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Client receives TCP packets but does not ACK
@ 2001-07-01 21:27 Nivedita Singhvi
  2001-07-11  3:43 ` Robert Kleemann
  0 siblings, 1 reply; 29+ messages in thread
From: Nivedita Singhvi @ 2001-07-01 21:27 UTC (permalink / raw)
  To: robert; +Cc: linux-kernel

> The bad network behavior was due to shared irqs somehow screwing 
> things up. This explained most but not all of the problems. 

ah, that's why your test pgm succeeded on my systems..
 
> When I last posted I had a reproducible test case which spewed a bunch 
> of packets from a server to a client. The behavior is that the client 
> eventually stops ACKing and so the the connection stalls indefinitely. 
> packet. I added printk statements for each of these conditions in 
> hopes of detecting why the final packet is not acked. I recompiled 
> the kernel, and reran the test. The result was that the packet was 
> being droped in tcp_rcv_established() due to an invalid checksum. I 

Ouch!

In the interests of not having it be so painful to identify the
problem (to this point, i.e. TCP drops due to checksum failures) 
the next time around, I'd like to ask:

- Were you seeing any bad csum error messages in /var/log/messages?
  i.e. or else was it only TCP?

- Was the stats field /proc/net/snmp/Tcp:InErrs
  reflecting those drops?

- What additional logging/stats gathering would have made this
  (silent drops due to checksum failures by TCP) easier to detect?

  My 2c:

  The stat TcpInErrs is updated for most TCP input failures.
  So its not obvious (unless youre real familiar with TCP)
  that there are checksum failures happening. It actually 
  includes only these errors:
        - checksum failures
        - header len problems
        - unexpected SYN's
 
  Is this adequate as a diagnostic, or would adding a breakdown
  counter(s) for checksum (and other) failures be useful? 
  At the moment, there is no logging TCP does on a plain vanilla 
  kernel, you have to recompile the kernel with NETDEBUG in order 
  to see logged checksum failures, at least at the TCP level. 

  It would be nice to have people be able to look at a counter or 
  stat on the fly and tell whether they're having packets silently 
  dropped due to checksum failures (and other issues) without needing 
  to recompile the kernel...
   
Any thoughts?

thanks,
Nivedita

---
I'd appreciate a cc since I'm not subscribed..
nivedita@sequent.com
nivedita@us.ibm.com 

^ permalink raw reply	[flat|nested] 29+ messages in thread
[parent not found: <E15BiHy-0002xC-00@the-village.bc.nu.suse.lists.linux.kernel>]
* RE: Client receives TCP packets but does not ACK
@ 2001-06-15 12:53 Heusden, Folkert van
  2001-06-15 18:27 ` Mike Black
  2001-06-18 11:50 ` Jan Hudec
  0 siblings, 2 replies; 29+ messages in thread
From: Heusden, Folkert van @ 2001-06-15 12:53 UTC (permalink / raw)
  To: Mike Black, linux-kernel

> TCP is NOT a guaranteed protocol -- you can't just blast data from one
port
> to another and expect it to work.

Isn't it? Are you really sure about that? I thought UDP was the
not-guaranteed-one and TCP was the one guaranting that all data reaches the
other end in order and all. Please enlighten me.


^ permalink raw reply	[flat|nested] 29+ messages in thread
* Client receives TCP packets but does not ACK
@ 2001-06-13  0:26 Robert Kleemann
  2001-06-15  3:50 ` Robert Kleemann
  2001-06-16 23:56 ` Robert Kleemann
  0 siblings, 2 replies; 29+ messages in thread
From: Robert Kleemann @ 2001-06-13  0:26 UTC (permalink / raw)
  To: linux-kernel

I have a client server program that opens a tcp connection between two
machines.  Everything is fine until a certain type of data is sent
across the socket at which point the client refuses to ACK and the
server continues to resend the packets to no avail.

I've verified that the client is blocking on a socket read (and not
coming out) I've also run "tcpdump -lxa -s 5000" on each machine and
verified that each packet sent by each machine is received by the
other.  I diffed the data and there appears to be no corruption.

I first saw this with the server running 2.4.2 and the client running
2.2.16 but I have since upgraded the server first to 2.4.5 and then
also added a patch from 1.4.6-pre2 that had to do with tcp acks.  The
bug still repros.  I have also upgraded the client to 2.4.2, 2.4.5,
and 2.4.5 + ack patch with no luck.

There have been quite a few other people who have experienced these
symptoms and posted to the list over the past 5 months or so.  I
haven't seen a resolution for any of them except for requests to try
the latest kernel since there have been a lot of networking fixes in
the latest kernels.  I have appened links to these other postings at
the end of this email in case their data might help.

I can consistently reproduce this problem on my machines (10mbs
ethernet lan) and would really like to narrow this bug down to the
source instead of trying the latest kernels and hoping that they solve
the problem. The networking code (net/ipv4/tcp*.c) is daunting to me
but if someone has any suggestions on good places to add debug code,
building a debug version, or whatever, I can try it on my local system
and investigate further.  This bug is driving me crazy and I want to
find it and fix it!

Are there any other details that would help?  My hardware
configuration? Network settings? etc?

Here is the analysis of one of the tcpdump logs for glottis.  glottis
is the client and manny is the server.  Note that the large packet
11006:1254(1448) is received by glottis and an ack is never sent to
manny.

20:07:45.043640 glottis->manny ack 11006
20:07:45.047120 manny->glottis 11006:12454(1448) ack 408 probably contains the remainder of ClientMap
20:07:45.047571 manny->glottis 12454:12936(482) ack 408
20:07:45.047673 glottis->manny ack 11006
20:07:45.272042 manny->glottis 11006:12454(1448) ack 408 resend
20:07:45.732049 manny->glottis 11006:12454(1448) ack 408 resend
20:07:46.652015 manny->glottis 11006:12454(1448) ack 408 resend
20:07:48.491986 manny->glottis 11006:12454(1448) ack 408 resend
20:07:52.171937 manny->glottis 11006:12454(1448) ack 408 resend
20:07:59.531850 manny->glottis 11006:12454(1448) ack 408 resend
web packets as manny is probably pinging session server
20:08:14.251656 manny->glottis 11006:12454(1448) ack 408 resend
20:08:24.078088 glottis->manny 408:437(29) ack 11006 text request in same packet
20:08:24.110417 manny->glottis ack 437
20:08:27.539778 glottis->manny 437:470(33) ack 11006 quit message
20:08:27.540158 manny->glottis 12936:12936(0) ack 470
20:08:27.541574 glottis->manny 470:472(2) ack 11006
20:08:27.542069 manny->glottis 12936:12936(0) ack 472
20:08:27.637385 manny->glottis 12936:12936(0) ack 473
web packets
ntp packets
20:08:43.691285 manny->glottis 11006:12454(1448) ack 473 resend
arp packets

Here are some other threads on the list that may be related to this problem:

http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=ca50bd5b6fab99dd,2&seekm=linux.kernel.3A806260.BB77D017%40denise.shiny.it#p

http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=c2b75d883be146f6,2&seekm=linux.kernel.5.0.2.1.2.20010115152847.00a8a380%40pop.we.mediaone.net#p

http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=5a94424eaed764df,21&seekm=linux.kernel.3A6F3C4A.27E148E9%40colorfullife.com#p

http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=d74b104bfe2da967,14&seekm=200104101738.VAA21467%40ms2.inr.ac.ru#p

http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=c15161c8342be0a0,7&seekm=linux.kernel.Pine.LNX.4.30.0012311601410.9994-100000%40shodan.irccrew.org#p

http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=7268b77eb1e07a38,3&seekm=20010419200905.A2970%40ping.be#p

http://groups.google.com/groups?hl=en&lr=&safe=off&ic=1&th=160b098279e28ca9,8&seekm=linux.kernel.F57chplw8IfbyyOxmQp000170f7%40hotmail.com#p

Please cc me on any replies.

thanx!
Robert


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2001-07-11  3:43 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.33.0106121720310.1152-100000@localhost.localdomain.suse.lists.linux.kernel>
2001-06-13  8:48 ` Client receives TCP packets but does not ACK Andi Kleen
2001-06-13 16:09   ` Robert Kleemann
2001-07-01 21:27 Nivedita Singhvi
2001-07-11  3:43 ` Robert Kleemann
     [not found] <E15BiHy-0002xC-00@the-village.bc.nu.suse.lists.linux.kernel>
2001-06-17 20:21 ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2001-06-15 12:53 Heusden, Folkert van
2001-06-15 18:27 ` Mike Black
2001-06-15 18:39   ` Gérard Roudier
2001-06-15 19:12   ` Alan Cox
2001-06-17 18:17     ` Pavel Machek
2001-06-17 19:32       ` Alan Cox
2001-06-17 19:40       ` Dan Podeanu
     [not found]         ` <200106172113.f5HLDhJ377473@saturn.cs.uml.edu>
2001-06-17 22:09           ` Dan Podeanu
2001-06-17 22:35         ` dean gaudet
2001-06-18 11:50 ` Jan Hudec
2001-06-18 16:17   ` dean gaudet
2001-06-18 16:48     ` Jonathan Morton
2001-06-18 22:30       ` dean gaudet
2001-06-18 23:43         ` Jonathan Morton
2001-06-19  2:46           ` dean gaudet
2001-06-20 21:01   ` David Schwartz
2001-06-13  0:26 Robert Kleemann
2001-06-15  3:50 ` Robert Kleemann
2001-06-15 12:44   ` Mike Black
2001-06-15 18:29     ` Albert D. Cahalan
2001-06-15 23:10       ` Robert Kleemann
2001-06-16 11:55       ` Mike Black
2001-06-16 23:56 ` Robert Kleemann
2001-06-27  1:04   ` Robert Kleemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).