All of lore.kernel.org
 help / color / mirror / Atom feed
From: "John A. Sullivan III" <jsullivan@opensourcedevel.com>
To: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Eric Leblond <eric@regit.org>, netfilter@vger.kernel.org
Subject: Re: Conntrack not matching properly - producing serious outages
Date: Fri, 12 Aug 2011 13:12:54 -0400	[thread overview]
Message-ID: <1313169174.2696.2.camel@jasiiieee.pacifera.com> (raw)
In-Reply-To: <1313098202.3628.86.camel@denise.theartistscloset.com>

On Thu, 2011-08-11 at 17:30 -0400, John A. Sullivan III wrote: 
> On Thu, 2011-08-11 at 22:41 +0200, Jozsef Kadlecsik wrote:
> > On Thu, 11 Aug 2011, John A. Sullivan III wrote:
> > 
> > > I've just begun to wade my way through SACK as Jozsef suggested after
> > > getting some sleep but I was able to catch a live one with logging
> > > enabled:
> > > 
> > > Aug 11 11:56:24 fw01 kernel: nf_ct_tcp: bad TCP checksum IN= OUT=
> > > SRC=95.172.228.42 DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52
> > > ID=29203 DF PROTO=TCP SPT=46721 DPT=441 SEQ=2834861284 ACK=3682327577
> > > WINDOW=1002 RES=0x00 ACK PSH URGP=0 OPT (0101080A01249B0846B0F23B)
> > 
> > That's Noop, Noop and Timestamp options and not SACK.
> > 
> > But the TCP checksum checking in conntrack says that the TCP checksum of 
> > the received packet is invalid, therefore it assings the INVALID 
> > state to the packet.
> Ah, so we do suspect that this is the culprit?
> >  
> > > Aug 11 11:56:24 fw01 kernel: INPUT INVALID IN=bond3 OUT=
> > > MAC=00:15:17:90:3c:0b:00:1c:58:ea:79:ff:08:00 SRC=95.172.228.42
> > > DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52 ID=29203 DF PROTO=TCP
> > > SPT=46721 DPT=441 WINDOW=1002 RES=0x00 ACK PSH URGP=0
> > > 
> > > Aug 11 11:56:24 fw01 kernel: No Match: IN=bond3 OUT=
> > > MAC=00:15:17:90:3c:0b:00:1c:58:ea:79:ff:08:00 SRC=95.172.228.42
> > > DST=208.a.b.8 LEN=260 TOS=0x00 PREC=0x00 TTL=52 ID=29203 DF PROTO=TCP
> > > SPT=46721 DPT=441 WINDOW=1002 RES=0x00 ACK PSH URGP=0
> > > 
> > > Is this telling me that the reason the packet has been classified as
> > > INVALID is because the TCP checksum is bad? We are doing checksum
> > > offloading so I would think the checksum in the packet evaluated by the
> > > kernel would be irrelevant.  We also have no problem if the users run
> > > their sessions through an OpenVPN tunnel.
> > 
> > TCP checksum offloading does not discard incoming packets with invalid 
> > checksum.
> Hmm . . . I wonder if we have a card which is going bad. This came on
> all of a sudden.  I was planning to disable offloading anyway to see if
> it solved the problem; I'm just awaiting a tester.  I'll report back
> what I find.  I certainly appreciate all the help - John
> >  
> > > I'll be digging into SACK next but wonder if I'm staring at the smoking
> > > gun and just don't recognize it.  I can try disabling offloading but not
> > > right now as the system is in heavy production.  Thanks - John
> > <snip>
Thanks to everyone for their help and my apologies for not getting back
sooner - we've been up almost continually battling this problem.

It looks like the netfilter involvement was a red herring.  We disabled
checksumming and the INVALID packet problem went away but the problem
persists.  We have hit and miss access and piles of duplicate ACKs and
retransmissions but it does not appear to be netfilter related.  Still
trying to figure out what changed of if we have some failing hardware.
Thanks again - John


  reply	other threads:[~2011-08-12 17:12 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-11  9:46 Conntrack not matching properly - producing serious outages John A. Sullivan III
2011-08-11 10:10 ` Eric Leblond
2011-08-11 12:03   ` John A. Sullivan III
2011-08-11 16:35   ` John A. Sullivan III
2011-08-11 20:41     ` Jozsef Kadlecsik
2011-08-11 21:30       ` John A. Sullivan III
2011-08-12 17:12         ` John A. Sullivan III [this message]
2011-08-12 22:31           ` John A. Sullivan III
2011-08-11 10:12 ` Jozsef Kadlecsik
2011-08-11 12:09   ` John A. Sullivan III
2011-08-11 12:26     ` Jozsef Kadlecsik
2011-08-11 12:36       ` John A. Sullivan III
2011-08-11 19:14       ` John A. Sullivan III
2011-08-11 20:21         ` Jozsef Kadlecsik
2011-08-11 14:00   ` Jan Engelhardt
2011-08-11 14:36     ` Jozsef Kadlecsik
2011-08-11 14:38       ` Jan Engelhardt
2011-08-11 14:48         ` Jozsef Kadlecsik
2011-08-11 14:59           ` AW: " Fiedler Roman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1313169174.2696.2.camel@jasiiieee.pacifera.com \
    --to=jsullivan@opensourcedevel.com \
    --cc=eric@regit.org \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=netfilter@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.