From: Robert Kleemann <robert@kleemann.org> To: <linux-kernel@vger.kernel.org> Subject: Re: Client receives TCP packets but does not ACK Date: Tue, 26 Jun 2001 18:04:19 -0700 (PDT) [thread overview] Message-ID: <Pine.LNX.4.33.0106261757530.1135-100000@localhost.localdomain> (raw) In-Reply-To: <Pine.LNX.4.33.0106161643560.1137-100000@localhost.localdomain> SUMMARY: The bad network behavior was due to shared irqs somehow screwing things up. This explained most but not all of the problems. DETAILS: Many people emailed me that they were experiencing similar problems. Even though the cause of my problem is not kernel related, I'm hoping my narrative and eventual solution will helps some folks. I also still think this behavior is really weird so those of you with an abundance of brains and curiosity might want to take a guess at explaining the behavior that I'm seeing. When I last posted I had a reproducible test case which spewed a bunch of packets from a server to a client. The behavior is that the client eventually stops ACKing and so the the connection stalls indefinitely. I spent some time studying the kernel networking code and traced the code path taken by a tcp packet: linux/net/core/dev.c:netif_rx() // packet received by eth card linux/net/ipv4/ip_input.c:ip_rcv() linux/net/ipv4/ip_input.c:ip_rcv_finish() linux/net/ipv4/tcp_ipv4.c:tcp_v4_recv() linux/net/ipv4/tcp_ipv4.c:tcp_v4_do_rcv() linux/net/ipv4/tcp_input.c:tcp_rcv_established() // packet placed in user queue Each routine had 2 to 6 conditions that would result in a dropped packet. I added printk statements for each of these conditions in hopes of detecting why the final packet is not acked. I recompiled the kernel, and reran the test. The result was that the packet was being droped in tcp_rcv_established() due to an invalid checksum. I then ran tcpdump to verify that the packets sent from the server were the same packets that were received by the client. It turned out that one byte was being corrupted and it was always the same byte in the stream that was corrupted. This was very confusing because my previous logs show _no_ corruption of the final packet. Anyway, now it appeared to be a hardware related problem so I started swapping ethernet cards to no effect. I then look at the irqs (cat /proc/interrupts) and noticed that the ethernet card in the client was sharing an irq with the aic7xxx scsi adapter. The following url made me think that this could be causing a problem: http://www.scyld.com/expert/irq-conflict.html The motherboard on the client is an old Intel PR440FX (dual 200mhz PPro, onboard LAN, SCSI) and doesn't allow any kind of configuring of the irqs so I ended up throwing another pci net card in the box just to juggle the irqs enough so that one of the net cards was not sharing an irq with the scsi card. The bug no longer repros! Neither the reduced test case nor the original shows any problems. My only remaining questions are: 1) Does this make sense? Would a scsi card sharing an irq with a net card cause rare but highly reproducable corruption? I was able to run http, telnet, ftp, mail, and games though this card with no problems. It only failed on a specific set of data. This is what initially led me to believe that the problem was not hardware related. 2) Now that two net cards are sharing an irq, have I just trading one subtle corruption bug for another? Will some different data set cause the same type of corruption? Is it safe to share irqs? 3) My old tcpdump logs (from several weeks ago) show _no_ corruption. I would have believed that I must have screwed up except that I still have the logs and the packets sent from the server compare exactly with those received by the client. I can't seem to reproduce this behavior. Robert.
next prev parent reply other threads:[~2001-06-27 1:04 UTC|newest] Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top 2001-06-13 0:26 Robert Kleemann 2001-06-15 3:50 ` Robert Kleemann 2001-06-15 12:44 ` Mike Black 2001-06-15 18:29 ` Albert D. Cahalan 2001-06-15 23:10 ` Robert Kleemann 2001-06-16 11:55 ` Mike Black 2001-06-16 23:56 ` Robert Kleemann 2001-06-27 1:04 ` Robert Kleemann [this message] [not found] <Pine.LNX.4.33.0106121720310.1152-100000@localhost.localdomain.suse.lists.linux.kernel> 2001-06-13 8:48 ` Andi Kleen 2001-06-13 16:09 ` Robert Kleemann 2001-06-15 12:53 Heusden, Folkert van 2001-06-15 18:27 ` Mike Black 2001-06-15 18:39 ` Gérard Roudier 2001-06-15 19:12 ` Alan Cox 2001-06-17 18:17 ` Pavel Machek 2001-06-17 19:32 ` Alan Cox 2001-06-17 19:40 ` Dan Podeanu [not found] ` <200106172113.f5HLDhJ377473@saturn.cs.uml.edu> 2001-06-17 22:09 ` Dan Podeanu 2001-06-17 22:35 ` dean gaudet 2001-06-18 11:50 ` Jan Hudec 2001-06-18 16:17 ` dean gaudet 2001-06-18 16:48 ` Jonathan Morton 2001-06-18 22:30 ` dean gaudet 2001-06-18 23:43 ` Jonathan Morton 2001-06-19 2:46 ` dean gaudet 2001-06-20 21:01 ` David Schwartz [not found] <E15BiHy-0002xC-00@the-village.bc.nu.suse.lists.linux.kernel> 2001-06-17 20:21 ` Andi Kleen 2001-07-01 21:27 Nivedita Singhvi 2001-07-11 3:43 ` Robert Kleemann
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=Pine.LNX.4.33.0106261757530.1135-100000@localhost.localdomain \ --to=robert@kleemann.org \ --cc=linux-kernel@vger.kernel.org \ --subject='Re: Client receives TCP packets but does not ACK' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).