archive mirror
 help / color / mirror / Atom feed
From: Robert Kleemann <>
To: <>
Subject: Re: Client receives TCP packets but does not ACK
Date: Tue, 26 Jun 2001 18:04:19 -0700 (PDT)	[thread overview]
Message-ID: <Pine.LNX.4.33.0106261757530.1135-100000@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.33.0106161643560.1137-100000@localhost.localdomain>


The bad network behavior was due to shared irqs somehow screwing
things up.  This explained most but not all of the problems.


Many people emailed me that they were experiencing similar problems.
Even though the cause of my problem is not kernel related, I'm hoping
my narrative and eventual solution will helps some folks.  I also
still think this behavior is really weird so those of you with an
abundance of brains and curiosity might want to take a guess at
explaining the behavior that I'm seeing.

When I last posted I had a reproducible test case which spewed a bunch
of packets from a server to a client.  The behavior is that the client
eventually stops ACKing and so the the connection stalls indefinitely.
I spent some time studying the kernel networking code and traced the
code path taken by a tcp packet:

linux/net/core/dev.c:netif_rx() // packet received by eth card
linux/net/ipv4/tcp_input.c:tcp_rcv_established() // packet placed in user queue

Each routine had 2 to 6 conditions that would result in a dropped
packet.  I added printk statements for each of these conditions in
hopes of detecting why the final packet is not acked.  I recompiled
the kernel, and reran the test.  The result was that the packet was
being droped in tcp_rcv_established() due to an invalid checksum.  I
then ran tcpdump to verify that the packets sent from the server were
the same packets that were received by the client.  It turned out that
one byte was being corrupted and it was always the same byte in the
stream that was corrupted.

This was very confusing because my previous logs show _no_ corruption
of the final packet.

Anyway, now it appeared to be a hardware related problem so I started
swapping ethernet cards to no effect.  I then look at the irqs (cat
/proc/interrupts) and noticed that the ethernet card in the client was
sharing an irq with the aic7xxx scsi adapter. The following url made
me think that this could be causing a problem:

The motherboard on the client is an old Intel PR440FX (dual 200mhz
PPro, onboard LAN, SCSI) and doesn't allow any kind of configuring of
the irqs so I ended up throwing another pci net card in the box just
to juggle the irqs enough so that one of the net cards was not sharing
an irq with the scsi card.  The bug no longer repros!  Neither the
reduced test case nor the original shows any problems.

My only remaining questions are:

1) Does this make sense?  Would a scsi card sharing an irq with a net
   card cause rare but highly reproducable corruption?  I was able to
   run http, telnet, ftp, mail, and games though this card with no
   problems.  It only failed on a specific set of data.  This is what
   initially led me to believe that the problem was not hardware

2) Now that two net cards are sharing an irq, have I just trading one
   subtle corruption bug for another?  Will some different data set
   cause the same type of corruption?  Is it safe to share irqs?

3) My old tcpdump logs (from several weeks ago) show _no_ corruption.
   I would have believed that I must have screwed up except that I
   still have the logs and the packets sent from the server compare
   exactly with those received by the client.  I can't seem to
   reproduce this behavior.


  reply	other threads:[~2001-06-27  1:04 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-06-13  0:26 Client receives TCP packets but does not ACK Robert Kleemann
2001-06-15  3:50 ` Robert Kleemann
2001-06-15 12:44   ` Mike Black
2001-06-15 18:29     ` Albert D. Cahalan
2001-06-15 23:10       ` Robert Kleemann
2001-06-16 11:55       ` Mike Black
2001-06-16 23:56 ` Robert Kleemann
2001-06-27  1:04   ` Robert Kleemann [this message]
     [not found] <Pine.LNX.4.33.0106121720310.1152-100000@localhost.localdomain.suse.lists.linux.kernel>
2001-06-13  8:48 ` Andi Kleen
2001-06-13 16:09   ` Robert Kleemann
2001-06-15 12:53 Heusden, Folkert van
2001-06-15 18:27 ` Mike Black
2001-06-15 18:39   ` Gérard Roudier
2001-06-15 19:12   ` Alan Cox
2001-06-17 18:17     ` Pavel Machek
2001-06-17 19:32       ` Alan Cox
2001-06-17 19:40       ` Dan Podeanu
     [not found]         ` <>
2001-06-17 22:09           ` Dan Podeanu
2001-06-17 22:35         ` dean gaudet
2001-06-18 11:50 ` Jan Hudec
2001-06-18 16:17   ` dean gaudet
2001-06-18 16:48     ` Jonathan Morton
2001-06-18 22:30       ` dean gaudet
2001-06-18 23:43         ` Jonathan Morton
2001-06-19  2:46           ` dean gaudet
2001-06-20 21:01   ` David Schwartz
     [not found] <>
2001-06-17 20:21 ` Andi Kleen
2001-07-01 21:27 Nivedita Singhvi
2001-07-11  3:43 ` Robert Kleemann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.33.0106261757530.1135-100000@localhost.localdomain \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).