All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP/IP stack overloaded?
@ 2013-02-11 23:54 Ari Heitner
  2013-02-12  0:11 ` Erik Auerswald
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Ari Heitner @ 2013-02-11 23:54 UTC (permalink / raw)
  To: lartc

We are having a strange intermittent problem on a natting firewall that's
been loyally serving us for years, and the current theory is that our
recent major connection upgrade (to 16/7 mbit, from about 7/1) plus the
switch to voip phones, is periodically bogging down the machine, which is a
little long in the tooth - a pIII-500.

Symptom: seemingly randomly, up to a few times a day, the network
connection just chokes for about 30 seconds. Pings and DNS still work ok,
but http traffic and seemingly anything else TCP just stops. Wait a few
seconds, and it starts again, processing the pending requests (i.e. the web
page you were waiting for suddenly loads, without hitting refresh again)
but playing havoc with voip phone calls. The behaviour seems to correlate
with the network being busy, but generally the machine can handle
throughput saturation with no problem, and does so regularly.

My friend suggested a stress-test: making a vpn (pptp) to his network, and
seeing if that makes the nat firewall box act up. Sure enough - make the
connection, and start copying a file (at a very low throughput like 50
kB/s) and load the network a little bit, and it freezes. And when the vpn
connection is active, even without doing anything, stuff starts getting
weird - the machine sometimes stops accepting incoming connections on port
22, and logmein.com sessions in progress will fail.

But through all of this the machine shows no load, and the tc in place
(wondershaper, modified slightly to prioritize traffic to and from the
remote voip server - I can post the script and tc output if it would be
useful) show no unusual amounts of packet droppage - which makes sense,
since the issue isn't a bandwidth overload.

How can I get stats on the "health" or load or anything of the tcp/ip
stack? Am I even barking up the right tree here?

Any and all suggestions would be appreciated.


Cheers,

Ari

--
Ari Heitner
Director of Technology
www.NCSY.ca - www.TorahHigh.org
w: 905.761.6279 x223 m: 647.202.1998

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TCP/IP stack overloaded?
  2013-02-11 23:54 TCP/IP stack overloaded? Ari Heitner
@ 2013-02-12  0:11 ` Erik Auerswald
  2013-02-12  5:30 ` Remy Mudingay
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Erik Auerswald @ 2013-02-12  0:11 UTC (permalink / raw)
  To: lartc

Hi Ari,

On 02/12/2013 12:54 AM, Ari Heitner wrote:
> We are having a strange intermittent problem on a natting firewall that's
>[...]
> Symptom: seemingly randomly, up to a few times a day, the network
> connection just chokes for about 30 seconds. Pings and DNS still work ok,
> but http traffic and seemingly anything else TCP just stops. Wait a few
> seconds, and it starts again, processing the pending requests (i.e. the web
> page you were waiting for suddenly loads, without hitting refresh again)
> but playing havoc with voip phone calls. The behaviour seems to correlate
> with the network being busy, but generally the machine can handle
> throughput saturation with no problem, and does so regularly.

Just an idea: check the number of currently active NAT sessions. Maybe 
this hits a limit, some older sessions time out and everything works again.

> My friend suggested a stress-test: making a vpn (pptp) to his network, and
> seeing if that makes the nat firewall box act up. Sure enough - make the
> connection, and start copying a file (at a very low throughput like 50
> kB/s) and load the network a little bit, and it freezes. And when the vpn
> connection is active, even without doing anything, stuff starts getting
> weird - the machine sometimes stops accepting incoming connections on port
> 22, and logmein.com sessions in progress will fail.

Though I don't know how this fits in...

Erik

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TCP/IP stack overloaded?
  2013-02-11 23:54 TCP/IP stack overloaded? Ari Heitner
  2013-02-12  0:11 ` Erik Auerswald
@ 2013-02-12  5:30 ` Remy Mudingay
  2013-02-13 21:27 ` Ari Heitner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Remy Mudingay @ 2013-02-12  5:30 UTC (permalink / raw)
  To: lartc

Hi Ari,

You need to isolate the problem more, from you description, the
problem could come from anywhere. I would look at the number if system
interrupts, qdisc and class statistics and nic statistics provided by
ethtool -S in order to isolate the problem. You could also mention ram
utilisation and the kernel version running on the system.

Remy
On 12 February 2013 00:54, Ari Heitner <ari@ncsy.ca> wrote:
> We are having a strange intermittent problem on a natting firewall that's
> been loyally serving us for years, and the current theory is that our
> recent major connection upgrade (to 16/7 mbit, from about 7/1) plus the
> switch to voip phones, is periodically bogging down the machine, which is a
> little long in the tooth - a pIII-500.
>
> Symptom: seemingly randomly, up to a few times a day, the network
> connection just chokes for about 30 seconds. Pings and DNS still work ok,
> but http traffic and seemingly anything else TCP just stops. Wait a few
> seconds, and it starts again, processing the pending requests (i.e. the web
> page you were waiting for suddenly loads, without hitting refresh again)
> but playing havoc with voip phone calls. The behaviour seems to correlate
> with the network being busy, but generally the machine can handle
> throughput saturation with no problem, and does so regularly.
>
> My friend suggested a stress-test: making a vpn (pptp) to his network, and
> seeing if that makes the nat firewall box act up. Sure enough - make the
> connection, and start copying a file (at a very low throughput like 50
> kB/s) and load the network a little bit, and it freezes. And when the vpn
> connection is active, even without doing anything, stuff starts getting
> weird - the machine sometimes stops accepting incoming connections on port
> 22, and logmein.com sessions in progress will fail.
>
> But through all of this the machine shows no load, and the tc in place
> (wondershaper, modified slightly to prioritize traffic to and from the
> remote voip server - I can post the script and tc output if it would be
> useful) show no unusual amounts of packet droppage - which makes sense,
> since the issue isn't a bandwidth overload.
>
> How can I get stats on the "health" or load or anything of the tcp/ip
> stack? Am I even barking up the right tree here?
>
> Any and all suggestions would be appreciated.
>
>
> Cheers,
>
> Ari
>
> --
> Ari Heitner
> Director of Technology
> www.NCSY.ca - www.TorahHigh.org
> w: 905.761.6279 x223 m: 647.202.1998
> --
> To unsubscribe from this list: send the line "unsubscribe lartc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TCP/IP stack overloaded?
  2013-02-11 23:54 TCP/IP stack overloaded? Ari Heitner
  2013-02-12  0:11 ` Erik Auerswald
  2013-02-12  5:30 ` Remy Mudingay
@ 2013-02-13 21:27 ` Ari Heitner
  2013-02-13 21:53 ` 叶雨飞
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Ari Heitner @ 2013-02-13 21:27 UTC (permalink / raw)
  To: lartc

On Mon, Feb 11, 2013 at 7:11 PM, Erik Auerswald
<auerswal@unix-ag.uni-kl.de> wrote:
> Hi Ari,
>
>
> On 02/12/2013 12:54 AM, Ari Heitner wrote:
>> Symptom: seemingly randomly, up to a few times a day, the network
>> connection just chokes for about 30 seconds.
>
> Just an idea: check the number of currently active NAT sessions. Maybe this
> hits a limit, some older sessions time out and everything works again.

Have been running netstat-nat -n to take a look at things, both during
normal operations and during the chokes - below is for normal
operations
    camelot:~# netstat-nat -n | wc -l
    374
    camelot:~# netstat-nat -n | grep tcp | wc -l
    269
    camelot:~# netstat-nat -n | grep ESTABLISHED | wc -l
    115
At least once through a choke I saw the total number of connections
stay about static just under 400, then increase slowly after things
started flowing again. But on the most recent choke, the number of
connections did skyrocket, hitting about 1000 tcp connections, before
things came down again.

Is anyone aware of a hard limit set anywhere that I could check/control?

Further suggestions appreciated...I'm looking for new diagnostic
directions (I had never run netstat-nat till yesterday)


cheers,

Ari




-- 
Ari Heitner
Director of Technology
www.NCSY.ca - www.TorahHigh.org
w: 905.761.6279 x223 m: 647.202.1998

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TCP/IP stack overloaded?
  2013-02-11 23:54 TCP/IP stack overloaded? Ari Heitner
                   ` (2 preceding siblings ...)
  2013-02-13 21:27 ` Ari Heitner
@ 2013-02-13 21:53 ` 叶雨飞
  2013-02-14  9:55 ` Andy Furniss
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: 叶雨飞 @ 2013-02-13 21:53 UTC (permalink / raw)
  To: lartc

anything in dmesg?   have yo used conntrack tools to check active
connections ?  (conntrack -L | wc -l)

also check route cache,  but dst overflow would show in your dmesg if
that happens (use lnstat otherwise) .

On Wed, Feb 13, 2013 at 1:27 PM, Ari Heitner <ari@ncsy.ca> wrote:
> On Mon, Feb 11, 2013 at 7:11 PM, Erik Auerswald
> <auerswal@unix-ag.uni-kl.de> wrote:
>> Hi Ari,
>>
>>
>> On 02/12/2013 12:54 AM, Ari Heitner wrote:
>>> Symptom: seemingly randomly, up to a few times a day, the network
>>> connection just chokes for about 30 seconds.
>>
>> Just an idea: check the number of currently active NAT sessions. Maybe this
>> hits a limit, some older sessions time out and everything works again.
>
> Have been running netstat-nat -n to take a look at things, both during
> normal operations and during the chokes - below is for normal
> operations
>     camelot:~# netstat-nat -n | wc -l
>     374
>     camelot:~# netstat-nat -n | grep tcp | wc -l
>     269
>     camelot:~# netstat-nat -n | grep ESTABLISHED | wc -l
>     115
> At least once through a choke I saw the total number of connections
> stay about static just under 400, then increase slowly after things
> started flowing again. But on the most recent choke, the number of
> connections did skyrocket, hitting about 1000 tcp connections, before
> things came down again.
>
> Is anyone aware of a hard limit set anywhere that I could check/control?
>
> Further suggestions appreciated...I'm looking for new diagnostic
> directions (I had never run netstat-nat till yesterday)
>
>
> cheers,
>
> Ari
>
>
>
>
> --
> Ari Heitner
> Director of Technology
> www.NCSY.ca - www.TorahHigh.org
> w: 905.761.6279 x223 m: 647.202.1998
> --
> To unsubscribe from this list: send the line "unsubscribe lartc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TCP/IP stack overloaded?
  2013-02-11 23:54 TCP/IP stack overloaded? Ari Heitner
                   ` (3 preceding siblings ...)
  2013-02-13 21:53 ` 叶雨飞
@ 2013-02-14  9:55 ` Andy Furniss
  2013-02-18 16:58 ` Ari Heitner
  2013-02-18 16:58 ` Ari Heitner
  6 siblings, 0 replies; 8+ messages in thread
From: Andy Furniss @ 2013-02-14  9:55 UTC (permalink / raw)
  To: lartc

Ari Heitner wrote:
> On Mon, Feb 11, 2013 at 7:11 PM, Erik Auerswald
> <auerswal@unix-ag.uni-kl.de> wrote:
>> Hi Ari,
>>
>>
>> On 02/12/2013 12:54 AM, Ari Heitner wrote:
>>> Symptom: seemingly randomly, up to a few times a day, the network
>>> connection just chokes for about 30 seconds.
>>
>> Just an idea: check the number of currently active NAT sessions. Maybe this
>> hits a limit, some older sessions time out and everything works again.
>
> Have been running netstat-nat -n to take a look at things, both during
> normal operations and during the chokes - below is for normal
> operations
>      camelot:~# netstat-nat -n | wc -l
>      374
>      camelot:~# netstat-nat -n | grep tcp | wc -l
>      269
>      camelot:~# netstat-nat -n | grep ESTABLISHED | wc -l
>      115

Doesn't seem very high, maybe you should check any traffic shaping rules 
you have and think about where arp is going - if you use htb default xx 
that will catch arp unless you have set up filters with "protocol arp" 
to make it go elsewhere.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TCP/IP stack overloaded?
  2013-02-11 23:54 TCP/IP stack overloaded? Ari Heitner
                   ` (4 preceding siblings ...)
  2013-02-14  9:55 ` Andy Furniss
@ 2013-02-18 16:58 ` Ari Heitner
  2013-02-18 16:58 ` Ari Heitner
  6 siblings, 0 replies; 8+ messages in thread
From: Ari Heitner @ 2013-02-18 16:58 UTC (permalink / raw)
  To: lartc

On Thu, Feb 14, 2013 at 4:55 AM, Andy Furniss <andyqos@ukfsn.org> wrote:
>
> Ari Heitner wrote:
>>
>> On Mon, Feb 11, 2013 at 7:11 PM, Erik Auerswald
>> <auerswal@unix-ag.uni-kl.de> wrote:
>>>
>>> Hi Ari,
>>>
>>>
>>> On 02/12/2013 12:54 AM, Ari Heitner wrote:
>>>>
>>>> Symptom: seemingly randomly, up to a few times a day, the network
>>>> connection just chokes for about 30 seconds.
>>>
>>>
>>> Just an idea: check the number of currently active NAT sessions. Maybe
>>> this
>>> hits a limit, some older sessions time out and everything works again.
>>
>>
>> Have been running netstat-nat -n to take a look at things, both during
>> normal operations and during the chokes - below is for normal
>> operations
>>      camelot:~# netstat-nat -n | wc -l
>>      374
>>      camelot:~# netstat-nat -n | grep tcp | wc -l
>>      269
>>      camelot:~# netstat-nat -n | grep ESTABLISHED | wc -l
>>      115
>
>
> Doesn't seem very high, maybe you should check any traffic shaping rules
> you have and think about where arp is going - if you use htb default xx that
> will catch arp unless you have set up filters with "protocol arp" to make it
> go elsewhere.
>
>

No HTB rules configured.


Ari

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: TCP/IP stack overloaded?
  2013-02-11 23:54 TCP/IP stack overloaded? Ari Heitner
                   ` (5 preceding siblings ...)
  2013-02-18 16:58 ` Ari Heitner
@ 2013-02-18 16:58 ` Ari Heitner
  6 siblings, 0 replies; 8+ messages in thread
From: Ari Heitner @ 2013-02-18 16:58 UTC (permalink / raw)
  To: lartc

On Wed, Feb 13, 2013 at 4:53 PM, 叶雨飞 <sunyucong@gmail.com> wrote:
> anything in dmesg?   have yo used conntrack tools to check active
> connections ?  (conntrack -L | wc -l)

Boils down to pretty much the same as netstat-nat, no? I posted the results
from that. Not seeing unusual load levels with conntrack -L either.

>
> also check route cache,  but dst overflow would show in your dmesg if
> that happens (use lnstat otherwise) .

Not sure what I should be seeing in the route cache...

Nothing funny that I've found so far in dmesg.

Totally not clear what I should be looking for in the lnstat output...


Ari

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-02-18 16:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-11 23:54 TCP/IP stack overloaded? Ari Heitner
2013-02-12  0:11 ` Erik Auerswald
2013-02-12  5:30 ` Remy Mudingay
2013-02-13 21:27 ` Ari Heitner
2013-02-13 21:53 ` 叶雨飞
2013-02-14  9:55 ` Andy Furniss
2013-02-18 16:58 ` Ari Heitner
2013-02-18 16:58 ` Ari Heitner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.