rare bad TCP checksum with 2.6.19?

* rare bad TCP checksum with 2.6.19?
@ 2007-01-14 22:59 Michael Tokarev
  2007-01-15  9:39 ` Herbert Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Tokarev @ 2007-01-14 22:59 UTC (permalink / raw)
  To: netdev

I noticied, after running with 2.6.19 for more than a month, that
sometimes, a file transfer, when one of the ends is running 2.6.19,
stalls at the very end of the file, forever.

Playing with tcpdump, I noticied that the host sends out packets with
wrong checksums, like this:

01:28:07.608457 IP (tos 0x0, ttl  64, id 11740, offset 0, flags [DF], length: 82)
    81.13.94.6.80 > 216.168.29.244.57064: FP [bad tcp cksum b011 (->7ae2)!]
    140062:140092(30) ack 125 win 2896 <nop,nop,timestamp 87610 145676467>

(here, 81.13.94.6 is running linux 2.6.19).

It happens only on rare cases, and not reliable repeatable.

After further playing I noticied that - almost - only packets with FIN flag
set (like the above), *and* containing some data in them (again, like the
above), shows this behaviour.

With FIN set, the thing is 100% repeatable (the only problem is to force the
system to actually send such a packet -- for that, one has to push quite some
data to the socket and immediately close it, so that there will be some data
to send in kernel buffer still at the moment of close).

This explains the observed behaviour - rare, unreliable stalls at the end of
a transfer -- because it's relatively rare when FIN packet contains some data.

But sometimes, other packets go out with bad checksum, too:

01:20:01.712146 IP (tos 0x0, ttl  64, id 52870, offset 0, flags [DF], length: 1500)
    81.13.94.6.80 > 216.168.29.244.57655: . [bad tcp cksum ab7e (->dcbd)!]
    112945:114393(1448) ack 125 win 2896 <nop,nop,timestamp 39006 145190996>

(again, 81.13.94.6 is a machine running linux 2.6.19).  That's one in a row of
other pretty normal packets - it has been retransmitted a bit later, with correct
checksum.

When switching back to 2.6.17 (previous kernel which was running on this
machine), things goes back to normal, or at least so it seems.

Note there's no funny/interesting hardware involved, like network cards with
tcp checksumming offload capabilities (this is plain dumb 8139 card).

I'll try to collect further information tomorrow.  But if someone has some
clue before.... ;)

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 32+ messages in thread