netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
@ 2014-06-04  9:14 Suprasad Mutalik Desai
  2014-06-04 14:12 ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Suprasad Mutalik Desai @ 2014-06-04  9:14 UTC (permalink / raw)
  To: netdev

Hi,


    Currently i am working on 3.10.12 kernel and it seems the Linux
stack performance (TCP and UDP) has degraded drastically as compared
to 2.6 kernel.

Results :

Linux 2.6.32
---------------------
TCP traffic using iperf
    - Upstream : 140 Mbps
    - Downstream : 148 Mbps

UDP traffic using iperf
    - Upstream : 200 Mbps
    - Downstream : 245 Mbps

Linux 3.10.12
--------------------
TCP traffic using iperf
    - Upstream : 101 Mbps
    - Downstream : 106 Mbps

UDP traffic using iperf
    - Upstream : 140 Mbps
    - Downstream : 170 Mbps

Analysis:
---------------
1.   As per profiling data on Linux-3.10.12 it seems,
             -   fib_table_lookup and ip_route_input_noref is being
called most of the times and thus causing the degradation in
performance.

    8.77    csum_partial 0x80009A20 1404
    4.53    ipt_do_table 0x80365C34 1352
    3.45    eth_xmit 0x870D0C88 5460
    3.41    fib_table_lookup 0x8035240C 856    <----------
    3.38    __netif_receive_skb_core 0x802B5C00 2276
    3.07    dma_device_write 0x80013BD4 752
    2.94    nf_iterate 0x802EA380 256
    2.69    ip_route_input_noref 0x8030CE14 2520    <--------------
    2.24    ip_forward 0x8031108C 1040
    2.04    tcp_packet 0x802F45BC 3956
    1.93    nf_conntrack_in 0x802EEAF4 2284

2.    Based on the above observation, when searched,  it seems Routing
cache code has been removed from Linux-3.6 kernel and thus every
packet has to go       through ip_route_input_noref to find the
destination.

3.    Related to this, a patch from David Miller adds "ipv4: Early TCP
socket demux" which caches the "dst per socket" and maintains
tcp_hashinfo and uses  early_demux(skb) (TCP --> tcp_v4_early_demux
and UDP --> NULL i.e not defined) to get the "dst" of that skb and
thus avoids ip_route_input_noref being called  everytime.
          -  But this still doesn’t handle routing scenarios (LAN <-->  WAN).

4.    A patch for UDP early demux has been added in Linux 3.13 and
certain bugfixes has gone in Linux-3.14 .

5.    As we are based on 3.10 thus no UDP early_demux support . This
means we have to backport the UDP early demux patch to 3.10 kernel .


Issue :
-----------

1.    The implementation of "Early TCP socket demux" doesn't address
the routing scenario (LAN <---> WAN) . This means TCP and UDP routing
performance will be less in 3.10 kernel and also in 3.14 kernel as
every packet has to go through route lookup.


Is there an alternative to get back the Linux stack performance of 2.6
or 3.4 kernel where we have the route cache ?

I guess plain routing scenario was NOT thought through while removing
the routing cache code.

Please guide .


Thanks and Regards,
Suprasad.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04  9:14 Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario Suprasad Mutalik Desai
@ 2014-06-04 14:12 ` David Laight
  2014-06-04 17:46   ` Suprasad Mutalik Desai
  0 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2014-06-04 14:12 UTC (permalink / raw)
  To: 'Suprasad Mutalik Desai', netdev

From: Suprasad Mutalik Desai
>     Currently i am working on 3.10.12 kernel and it seems the Linux
> stack performance (TCP and UDP) has degraded drastically as compared
> to 2.6 kernel.
> 
> Results :
> 
> Linux 2.6.32
> ---------------------
> TCP traffic using iperf
>     - Upstream : 140 Mbps
>     - Downstream : 148 Mbps
> 
> UDP traffic using iperf
>     - Upstream : 200 Mbps
>     - Downstream : 245 Mbps
> 
> Linux 3.10.12
> --------------------
> TCP traffic using iperf
>     - Upstream : 101 Mbps
>     - Downstream : 106 Mbps
> 
> UDP traffic using iperf
>     - Upstream : 140 Mbps
>     - Downstream : 170 Mbps
> 
> Analysis:
> ---------------
> 1.   As per profiling data on Linux-3.10.12 it seems,
>              -   fib_table_lookup and ip_route_input_noref is being
> called most of the times and thus causing the degradation in
> performance.
> 
>     8.77    csum_partial 0x80009A20 1404
>     4.53    ipt_do_table 0x80365C34 1352
>     3.45    eth_xmit 0x870D0C88 5460
>     3.41    fib_table_lookup 0x8035240C 856    <----------
>     3.38    __netif_receive_skb_core 0x802B5C00 2276
>     3.07    dma_device_write 0x80013BD4 752
>     2.94    nf_iterate 0x802EA380 256
>     2.69    ip_route_input_noref 0x8030CE14 2520    <--------------
>     2.24    ip_forward 0x8031108C 1040
>     2.04    tcp_packet 0x802F45BC 3956
>     1.93    nf_conntrack_in 0x802EEAF4 2284
> 
> 2.    Based on the above observation, when searched,  it seems Routing
> cache code has been removed from Linux-3.6 kernel and thus every
> packet has to go       through ip_route_input_noref to find the
> destination.

That doesn't look like enough cpu time to give the observed reduction
in throughput (assuming those numbers are in %).

Are you sure the system is actually running at 100% cpu?
(Although I'm not sure what to trust for cpu usage with these sorts of tests.)

	David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 14:12 ` David Laight
@ 2014-06-04 17:46   ` Suprasad Mutalik Desai
  0 siblings, 0 replies; 7+ messages in thread
From: Suprasad Mutalik Desai @ 2014-06-04 17:46 UTC (permalink / raw)
  To: David Laight; +Cc: netdev

Hi David,

On Wed, Jun 4, 2014 at 7:42 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Suprasad Mutalik Desai
>>     Currently i am working on 3.10.12 kernel and it seems the Linux
>> stack performance (TCP and UDP) has degraded drastically as compared
>> to 2.6 kernel.
>>
>> Results :
>>
>> Linux 2.6.32
>> ---------------------
>> TCP traffic using iperf
>>     - Upstream : 140 Mbps
>>     - Downstream : 148 Mbps
>>
>> UDP traffic using iperf
>>     - Upstream : 200 Mbps
>>     - Downstream : 245 Mbps
>>
>> Linux 3.10.12
>> --------------------
>> TCP traffic using iperf
>>     - Upstream : 101 Mbps
>>     - Downstream : 106 Mbps
>>
>> UDP traffic using iperf
>>     - Upstream : 140 Mbps
>>     - Downstream : 170 Mbps
>>
>> Analysis:
>> ---------------
>> 1.   As per profiling data on Linux-3.10.12 it seems,
>>              -   fib_table_lookup and ip_route_input_noref is being
>> called most of the times and thus causing the degradation in
>> performance.
>>
>>     8.77    csum_partial 0x80009A20 1404
>>     4.53    ipt_do_table 0x80365C34 1352
>>     3.45    eth_xmit 0x870D0C88 5460
>>     3.41    fib_table_lookup 0x8035240C 856    <----------
>>     3.38    __netif_receive_skb_core 0x802B5C00 2276
>>     3.07    dma_device_write 0x80013BD4 752
>>     2.94    nf_iterate 0x802EA380 256
>>     2.69    ip_route_input_noref 0x8030CE14 2520    <--------------
>>     2.24    ip_forward 0x8031108C 1040
>>     2.04    tcp_packet 0x802F45BC 3956
>>     1.93    nf_conntrack_in 0x802EEAF4 2284
>>
>> 2.    Based on the above observation, when searched,  it seems Routing
>> cache code has been removed from Linux-3.6 kernel and thus every
>> packet has to go       through ip_route_input_noref to find the
>> destination.
>
> That doesn't look like enough cpu time to give the observed reduction
> in throughput (assuming those numbers are in %).
>

Yes, you are correct.We are running the test for a short duration .
Currently we run iperf for 50 iterations .

I cross checked by running more iterations but the behaviour is same
as we observe performance degradation .


> Are you sure the system is actually running at 100% cpu?
> (Although I'm not sure what to trust for cpu usage with these sorts of tests.)
>
>         David

Yes, we checked with mpstat and cpu loading was 100%.

Regards,
Suprasad

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-05  2:17     ` Suprasad Mutalik Desai
  2014-06-05  6:08       ` David Miller
@ 2014-06-05  6:32       ` Eric Dumazet
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2014-06-05  6:32 UTC (permalink / raw)
  To: Suprasad Mutalik Desai; +Cc: David Miller, netdev

On Thu, 2014-06-05 at 07:47 +0530, Suprasad Mutalik Desai wrote:

> So, I request you to please suggest on how to address this topic w.r.t
> embedded devices ( routers) which has limited memory resources ?

Hmm... How do you catch a cloud and pin it down ?

More seriously...

What about removing this tcp checksum ?

This is the main cost according to your performance data.

Then, you are free to implement a cache of your own where it might help.

Tom Herbert added partial cache on tunnels for example in linux-3.15 or
something like that.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-05  2:17     ` Suprasad Mutalik Desai
@ 2014-06-05  6:08       ` David Miller
  2014-06-05  6:32       ` Eric Dumazet
  1 sibling, 0 replies; 7+ messages in thread
From: David Miller @ 2014-06-05  6:08 UTC (permalink / raw)
  To: suprasad.desai; +Cc: netdev, davem

From: Suprasad Mutalik Desai <suprasad.desai@gmail.com>
Date: Thu, 5 Jun 2014 07:47:48 +0530

> So, I request you to please suggest on how to address this topic w.r.t
> embedded devices ( routers) which has limited memory resources ?

You are seeing the downside of the tradeoff we have made in removing
the routing cache.

But we are still OK with the ramifications of this tradeoff, these
are the new performance metrics we have to simply live with.

I also think you are quite unreasonable, considering that the real
problem on your setup is the lack of hardware checksum offloading
_PLUS_ the fact that you insist creating a situation where your
"low powered" embedded machine has to checksum every single packet
going through it.  That's rediculous.

Truthfully, you'd also have excellent througput if you didn't have
netfilter, or at least disabled netfilter conntrack checksumming.

That checksum on every packet significantly dwarfs the overhead caused
by the routing lookups.

Therefore, I think you should concentrate your efforts in that area.

So to make it clear, there is nothing we are going to "address" wrt.
routing lookup cost, because we are perfectly happy with the tradeoff
between exploitability, scalability, and performance that we've made.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 19:18   ` David Miller
@ 2014-06-05  2:17     ` Suprasad Mutalik Desai
  2014-06-05  6:08       ` David Miller
  2014-06-05  6:32       ` Eric Dumazet
  0 siblings, 2 replies; 7+ messages in thread
From: Suprasad Mutalik Desai @ 2014-06-05  2:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, davem

Hi David,

On Thu, Jun 5, 2014 at 12:48 AM, David Miller <davem@davemloft.net> wrote:
> From: Suprasad Mutalik Desai <suprasad.desai@gmail.com>
> Date: Wed, 4 Jun 2014 14:34:10 +0530
>
>> I guess plain routing scenario was NOT thought through while removing
>> the routing cache code.
>
> It is exactly the scenerio that was considered.
>
> The routing cache was susceptible to trivial denial of service
> attacks, it therefore had to be removed.
>
> It also scaled poorly with large numbers of active flows.

Based on your and Eric's explanation , i clearly understand the
purpose of removing route cache.

But as there was no alternative proposed after removing route cache
mechanism, this is causing a side effect of low throughput in an
embedded router scenario with limited cache and memory resources. With
a limited cache in an embedded router environment checking the routing
table for the dst for every packet (routed scenario) is a heavy
operation which can lead to poor stack  performance.

So, I request you to please suggest on how to address this topic w.r.t
embedded devices ( routers) which has limited memory resources ?

Thanks and Regards,
Suprasad.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04  9:04 ` Fwd: " Suprasad Mutalik Desai
@ 2014-06-04 19:18   ` David Miller
  2014-06-05  2:17     ` Suprasad Mutalik Desai
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2014-06-04 19:18 UTC (permalink / raw)
  To: suprasad.desai; +Cc: netdev, davem

From: Suprasad Mutalik Desai <suprasad.desai@gmail.com>
Date: Wed, 4 Jun 2014 14:34:10 +0530

> I guess plain routing scenario was NOT thought through while removing
> the routing cache code.

It is exactly the scenerio that was considered.

The routing cache was susceptible to trivial denial of service
attacks, it therefore had to be removed.

It also scaled poorly with large numbers of active flows.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-06-05  6:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-04  9:14 Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario Suprasad Mutalik Desai
2014-06-04 14:12 ` David Laight
2014-06-04 17:46   ` Suprasad Mutalik Desai
     [not found] <CAJMXqXZQ_S27LCZNJDjcQ9jy2qyJ0UT2nk+wdZOTQep+5rQZhQ@mail.gmail.com>
2014-06-04  9:04 ` Fwd: " Suprasad Mutalik Desai
2014-06-04 19:18   ` David Miller
2014-06-05  2:17     ` Suprasad Mutalik Desai
2014-06-05  6:08       ` David Miller
2014-06-05  6:32       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).