Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario

All of lore.kernel.org
 help / color / mirror / Atom feed

* Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
       [not found] <CAJMXqXZQ_S27LCZNJDjcQ9jy2qyJ0UT2nk+wdZOTQep+5rQZhQ@mail.gmail.com>
@ 2014-06-04  9:04 ` Suprasad Mutalik Desai
  2014-06-04 12:34   ` Eric Dumazet
  2014-06-04 19:18   ` David Miller
  0 siblings, 2 replies; 14+ messages in thread
From: Suprasad Mutalik Desai @ 2014-06-04  9:04 UTC (permalink / raw)
  To: netdev, davem

Hi,


    Currently i am working on 3.10.12 kernel and it seems the Linux
stack performance (TCP and UDP) has degraded drastically as compared
to 2.6 kernel.

Results :

Linux 2.6.32
---------------------
TCP traffic using iperf
    - Upstream : 140 Mbps
    - Downstream : 148 Mbps

UDP traffic using iperf
    - Upstream : 200 Mbps
    - Downstream : 245 Mbps

Linux 3.10.12
--------------------
TCP traffic using iperf
    - Upstream : 101 Mbps
    - Downstream : 106 Mbps

UDP traffic using iperf
    - Upstream : 140 Mbps
    - Downstream : 170 Mbps

Analysis:
---------------
1.   As per profiling data on Linux-3.10.12 it seems,
             -   fib_table_lookup and ip_route_input_noref is being
called most of the times and thus causing the degradation in
performance.

    8.77    csum_partial 0x80009A20 1404
    4.53    ipt_do_table 0x80365C34 1352
    3.45    eth_xmit 0x870D0C88 5460
    3.41    fib_table_lookup 0x8035240C 856    <----------
    3.38    __netif_receive_skb_core 0x802B5C00 2276
    3.07    dma_device_write 0x80013BD4 752
    2.94    nf_iterate 0x802EA380 256
    2.69    ip_route_input_noref 0x8030CE14 2520    <--------------
    2.24    ip_forward 0x8031108C 1040
    2.04    tcp_packet 0x802F45BC 3956
    1.93    nf_conntrack_in 0x802EEAF4 2284

2.    Based on the above observation, when searched,  it seems Routing
cache code has been removed from Linux-3.6 kernel and thus every
packet has to go through ip_route_input_noref to find the destination.

3.    Related to this, a patch from David Miller adds "ipv4: Early TCP
socket demux" which caches the "dst per socket" and maintains
tcp_hashinfo and uses early_demux(skb) (TCP --> tcp_v4_early_demux and
UDP --> NULL i.e not defined) to get the "dst" of that skb and thus
avoids ip_route_input_noref being called everytime.
          -  But this still doesn’t handle routing scenarios (LAN <-->  WAN).

4.    A patch for UDP early demux has been added in Linux 3.13 and
certain bugfixes has gone in Linux-3.14 .

5.    As we are based on 3.10 thus no UDP early_demux support . This
means we have to backport the UDP early demux patch to 3.10 kernel .


Issue :
-----------

1.    The implementation of "Early TCP socket demux" doesn't address
the routing scenario (LAN <---> WAN) . This means TCP and UDP routing
performance will be less in 3.10 kernel and also in 3.14 kernel as
every packet has to go through route lookup.


Is there an alternative to get back the Linux stack performance of 2.6
or 3.4 kernel where we have the route cache ?

I guess plain routing scenario was NOT thought through while removing
the routing cache code.

Please guide .


Thanks and Regards,
Suprasad.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04  9:04 ` Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario Suprasad Mutalik Desai
@ 2014-06-04 12:34   ` Eric Dumazet
  2014-06-04 13:53     ` Suprasad Mutalik Desai
  2014-06-04 19:18   ` David Miller
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2014-06-04 12:34 UTC (permalink / raw)
  To: Suprasad Mutalik Desai; +Cc: netdev, davem

On Wed, 2014-06-04 at 14:34 +0530, Suprasad Mutalik Desai wrote:
> Hi,
> 
> 
>     Currently i am working on 3.10.12 kernel and it seems the Linux
> stack performance (TCP and UDP) has degraded drastically as compared
> to 2.6 kernel.
> 
> Results :
> 
> Linux 2.6.32
> ---------------------
> TCP traffic using iperf
>     - Upstream : 140 Mbps
>     - Downstream : 148 Mbps
> 
> UDP traffic using iperf
>     - Upstream : 200 Mbps
>     - Downstream : 245 Mbps
> 
> Linux 3.10.12
> --------------------
> TCP traffic using iperf
>     - Upstream : 101 Mbps
>     - Downstream : 106 Mbps
> 
> UDP traffic using iperf
>     - Upstream : 140 Mbps
>     - Downstream : 170 Mbps
> 
> Analysis:
> ---------------
> 1.   As per profiling data on Linux-3.10.12 it seems,
>              -   fib_table_lookup and ip_route_input_noref is being
> called most of the times and thus causing the degradation in
> performance.
> 
>     8.77    csum_partial 0x80009A20 1404

Main problem here is lack of checksums. What kind of NIC is used ?

>     4.53    ipt_do_table 0x80365C34 1352
>     3.45    eth_xmit 0x870D0C88 5460
>     3.41    fib_table_lookup 0x8035240C 856    <----------
>     3.38    __netif_receive_skb_core 0x802B5C00 2276
>     3.07    dma_device_write 0x80013BD4 752
>     2.94    nf_iterate 0x802EA380 256
>     2.69    ip_route_input_noref 0x8030CE14 2520    <--------------
>     2.24    ip_forward 0x8031108C 1040
>     2.04    tcp_packet 0x802F45BC 3956
>     1.93    nf_conntrack_in 0x802EEAF4 2284
> 
> 2.    Based on the above observation, when searched,  it seems Routing
> cache code has been removed from Linux-3.6 kernel and thus every
> packet has to go through ip_route_input_noref to find the destination.
> 
> 3.    Related to this, a patch from David Miller adds "ipv4: Early TCP
> socket demux" which caches the "dst per socket" and maintains
> tcp_hashinfo and uses early_demux(skb) (TCP --> tcp_v4_early_demux and
> UDP --> NULL i.e not defined) to get the "dst" of that skb and thus
> avoids ip_route_input_noref being called everytime.
>           -  But this still doesn’t handle routing scenarios (LAN <-->  WAN).
> 
> 4.    A patch for UDP early demux has been added in Linux 3.13 and
> certain bugfixes has gone in Linux-3.14 .
> 
> 5.    As we are based on 3.10 thus no UDP early_demux support . This
> means we have to backport the UDP early demux patch to 3.10 kernel .

Nope : This will be of no use on a router. It even will slow down the
router.

> 
> 
> Issue :
> -----------
> 
> 1.    The implementation of "Early TCP socket demux" doesn't address
> the routing scenario (LAN <---> WAN) . This means TCP and UDP routing
> performance will be less in 3.10 kernel and also in 3.14 kernel as
> every packet has to go through route lookup.
> 
> 
> Is there an alternative to get back the Linux stack performance of 2.6
> or 3.4 kernel where we have the route cache ?
> 
> I guess plain routing scenario was NOT thought through while removing
> the routing cache code.

This is the opposite. Route cache was easily targeted by DDOS attacks.

This was a nightmare.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 12:34   ` Eric Dumazet
@ 2014-06-04 13:53     ` Suprasad Mutalik Desai
  2014-06-04 14:59       ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Suprasad Mutalik Desai @ 2014-06-04 13:53 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

Hi Eric,

             Thanks for your inputs. Please find my comments inline.

On Wed, Jun 4, 2014 at 6:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-06-04 at 14:34 +0530, Suprasad Mutalik Desai wrote:
>> Hi,
>>
>>
>>     Currently i am working on 3.10.12 kernel and it seems the Linux
>> stack performance (TCP and UDP) has degraded drastically as compared
>> to 2.6 kernel.
>>
>> Results :
>>
>> Linux 2.6.32
>> ---------------------
>> TCP traffic using iperf
>>     - Upstream : 140 Mbps
>>     - Downstream : 148 Mbps
>>
>> UDP traffic using iperf
>>     - Upstream : 200 Mbps
>>     - Downstream : 245 Mbps
>>
>> Linux 3.10.12
>> --------------------
>> TCP traffic using iperf
>>     - Upstream : 101 Mbps
>>     - Downstream : 106 Mbps
>>
>> UDP traffic using iperf
>>     - Upstream : 140 Mbps
>>     - Downstream : 170 Mbps
>>
>> Analysis:
>> ---------------
>> 1.   As per profiling data on Linux-3.10.12 it seems,
>>              -   fib_table_lookup and ip_route_input_noref is being
>> called most of the times and thus causing the degradation in
>> performance.
>>
>>     8.77    csum_partial 0x80009A20 1404
>
> Main problem here is lack of checksums. What kind of NIC is used ?

I missed out explaining my setup in the previous mail, We use an
embedded router platform running at 600Mhz CPU speed. The NICs don't
have Checksum offload function (as you pointed out), but what I am
trying to analyze is relative performance drop on our router in 3.10
kernel vs 2.6.32/3.4 kernels. For this test, I am sending TCP traffic
using iperf from LAN Ethernet port to WAN Ethernet port, with routing
done by Linux kernel.

>
>>     4.53    ipt_do_table 0x80365C34 1352
>>     3.45    eth_xmit 0x870D0C88 5460
>>     3.41    fib_table_lookup 0x8035240C 856    <----------
>>     3.38    __netif_receive_skb_core 0x802B5C00 2276
>>     3.07    dma_device_write 0x80013BD4 752
>>     2.94    nf_iterate 0x802EA380 256
>>     2.69    ip_route_input_noref 0x8030CE14 2520    <--------------
>>     2.24    ip_forward 0x8031108C 1040
>>     2.04    tcp_packet 0x802F45BC 3956
>>     1.93    nf_conntrack_in 0x802EEAF4 2284
>>
>> 2.    Based on the above observation, when searched,  it seems Routing
>> cache code has been removed from Linux-3.6 kernel and thus every
>> packet has to go through ip_route_input_noref to find the destination.
>>
>> 3.    Related to this, a patch from David Miller adds "ipv4: Early TCP
>> socket demux" which caches the "dst per socket" and maintains
>> tcp_hashinfo and uses early_demux(skb) (TCP --> tcp_v4_early_demux and
>> UDP --> NULL i.e not defined) to get the "dst" of that skb and thus
>> avoids ip_route_input_noref being called everytime.
>>           -  But this still doesn’t handle routing scenarios (LAN <-->  WAN).
>>
>> 4.    A patch for UDP early demux has been added in Linux 3.13 and
>> certain bugfixes has gone in Linux-3.14 .
>>
>> 5.    As we are based on 3.10 thus no UDP early_demux support . This
>> means we have to backport the UDP early demux patch to 3.10 kernel .
>
> Nope : This will be of no use on a router. It even will slow down the
> router.
>

So i gather, early_demux code is applicable to the traffic which is
either originating or terminating on the device basically Host
scenario and it is not relevant in routed scenarios.

>>
>>
>> Issue :
>> -----------
>>
>> 1.    The implementation of "Early TCP socket demux" doesn't address
>> the routing scenario (LAN <---> WAN) . This means TCP and UDP routing
>> performance will be less in 3.10 kernel and also in 3.14 kernel as
>> every packet has to go through route lookup.
>>
>>
>> Is there an alternative to get back the Linux stack performance of 2.6
>> or 3.4 kernel where we have the route cache ?
>>
>> I guess plain routing scenario was NOT thought through while removing
>> the routing cache code.
>
> This is the opposite. Route cache was easily targeted by DDOS attacks.
>
> This was a nightmare.
>
>
>
I understood from you that the old route cache mechanism had DoS
vulnerabilities, so the new mechanism is implemented. What I am trying
to figure out is whether that will cause the kind of throughput drop
that I am seeing ?

TCP performance
            - Upstream : 140 Mbps(Linux 2.6.32) --> 101Mbps (Linux 3.10.12)
            - Downstream : 148 Mbps(Linux 2.6.32) --> 106Mbps (Linux 3.10.12)

Thanks and Regards,
Suprasad.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 13:53     ` Suprasad Mutalik Desai
@ 2014-06-04 14:59       ` Eric Dumazet
  2014-06-04 16:34         ` Eric Dumazet
                           ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Eric Dumazet @ 2014-06-04 14:59 UTC (permalink / raw)
  To: Suprasad Mutalik Desai; +Cc: netdev, davem

On Wed, 2014-06-04 at 19:23 +0530, Suprasad Mutalik Desai wrote:

> I understood from you that the old route cache mechanism had DoS
> vulnerabilities, so the new mechanism is implemented. What I am trying
> to figure out is whether that will cause the kind of throughput drop
> that I am seeing ?
> 
> TCP performance
>             - Upstream : 140 Mbps(Linux 2.6.32) --> 101Mbps (Linux 3.10.12)
>             - Downstream : 148 Mbps(Linux 2.6.32) --> 106Mbps (Linux 3.10.12)

Sure.

A cache is supposed to help performance, right ?

Problem with IPv4 route cache is that is was too easy to flood it and
get worse performance.

It had simply huge memory costs, and non predictable behavior [1]

Better have a system behaving correctly at 99 percentile,
than a system working well _only_ if workload is very gentle.

[1] Well, sort of : prediction was : it's so easy to remotely crash the
host.

Your host has very little cache on cpu, very little bandwidth, so the
previous IPv4 cache was probably helping.

Its yet not clear why a router has to checksum TCP packets.

Maybe a conntracking requirement...

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 14:59       ` Eric Dumazet
@ 2014-06-04 16:34         ` Eric Dumazet
  2014-06-04 18:03           ` Suprasad Mutalik Desai
  2014-06-04 17:41         ` Suprasad Mutalik Desai
  2014-06-04 18:26         ` sowmini varadhan
  2 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2014-06-04 16:34 UTC (permalink / raw)
  To: Suprasad Mutalik Desai; +Cc: netdev, davem

On Wed, 2014-06-04 at 07:59 -0700, Eric Dumazet wrote:

> Its yet not clear why a router has to checksum TCP packets.
> 
> Maybe a conntracking requirement...


Right... you can tweak nf_conntrack_checksum

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 14:59       ` Eric Dumazet
  2014-06-04 16:34         ` Eric Dumazet
@ 2014-06-04 17:41         ` Suprasad Mutalik Desai
  2014-06-04 18:26         ` sowmini varadhan
  2 siblings, 0 replies; 14+ messages in thread
From: Suprasad Mutalik Desai @ 2014-06-04 17:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

Hi Eric,


On Wed, Jun 4, 2014 at 8:29 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-06-04 at 19:23 +0530, Suprasad Mutalik Desai wrote:
>
>> I understood from you that the old route cache mechanism had DoS
>> vulnerabilities, so the new mechanism is implemented. What I am trying
>> to figure out is whether that will cause the kind of throughput drop
>> that I am seeing ?
>>
>> TCP performance
>>             - Upstream : 140 Mbps(Linux 2.6.32) --> 101Mbps (Linux 3.10.12)
>>             - Downstream : 148 Mbps(Linux 2.6.32) --> 106Mbps (Linux 3.10.12)
>
> Sure.
>
> A cache is supposed to help performance, right ?
>
> Problem with IPv4 route cache is that is was too easy to flood it and
> get worse performance.
>
> It had simply huge memory costs, and non predictable behavior [1]
>
> Better have a system behaving correctly at 99 percentile,
> than a system working well _only_ if workload is very gentle.
>
>
> [1] Well, sort of : prediction was : it's so easy to remotely crash the
> host.
>
> Your host has very little cache on cpu, very little bandwidth, so the
> previous IPv4 cache was probably helping.
>

Yes , you are correct . We are working on an embedded router which has
limited cache .

So, could you please suggest on how to address this topic w.r.t
embedded devices ( routers) which has limited memory resources ?

> Its yet not clear why a router has to checksum TCP packets.
>
> Maybe a conntracking requirement...
>

Yes, in our case the checksum is done in Linux stack as NIC doesn't
support checksum offloading .

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 16:34         ` Eric Dumazet
@ 2014-06-04 18:03           ` Suprasad Mutalik Desai
  0 siblings, 0 replies; 14+ messages in thread
From: Suprasad Mutalik Desai @ 2014-06-04 18:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem

Hi Eric,

On Wed, Jun 4, 2014 at 10:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-06-04 at 07:59 -0700, Eric Dumazet wrote:
>
>> Its yet not clear why a router has to checksum TCP packets.
>>
>> Maybe a conntracking requirement...
>
>
> Right... you can tweak nf_conntrack_checksum
>

Will definitely try this option but as you mentioned with a limited
cache in an embedded router environment checking the routing table for
the dst for every packet (routed scenario) is a heavy operation which
can still lead to poor stack performance.

So, I request you to please suggest on how to address this topic w.r.t
embedded devices ( routers) which has limited memory resources ?

Thanks and Regards,
Suprasad.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 14:59       ` Eric Dumazet
  2014-06-04 16:34         ` Eric Dumazet
  2014-06-04 17:41         ` Suprasad Mutalik Desai
@ 2014-06-04 18:26         ` sowmini varadhan
  2014-06-04 18:32           ` Neal Cardwell
  2014-06-04 18:44           ` Eric Dumazet
  2 siblings, 2 replies; 14+ messages in thread
From: sowmini varadhan @ 2014-06-04 18:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Suprasad Mutalik Desai, netdev, David Miller

On Wed, Jun 4, 2014 at 10:59 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Problem with IPv4 route cache is that is was too easy to flood it and
> get worse performance.
>
> It had simply huge memory costs, and non predictable behavior [1]

Can you elaborate? Caches usually have an upper bound on
the memory to be consumed by the cache, and from my reading
of the 2.6.32 code, seems like it also had such limits (there was
a sysctl tunable to turn off hashing, and rt_intern_hash() also had
other bounds. Were these not enough?

> [1] Well, sort of : prediction was : it's so easy to remotely crash the
> host.

--Sowmini

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 18:26         ` sowmini varadhan
@ 2014-06-04 18:32           ` Neal Cardwell
  2014-06-04 18:44           ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: Neal Cardwell @ 2014-06-04 18:32 UTC (permalink / raw)
  To: sowmini varadhan
  Cc: Eric Dumazet, Suprasad Mutalik Desai, netdev, David Miller

On Wed, Jun 4, 2014 at 2:26 PM, sowmini varadhan <sowmini05@gmail.com> wrote:
> On Wed, Jun 4, 2014 at 10:59 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> Problem with IPv4 route cache is that is was too easy to flood it and
>> get worse performance.
>>
>> It had simply huge memory costs, and non predictable behavior [1]
>
> Can you elaborate? Caches usually have an upper bound on
> the memory to be consumed by the cache, and from my reading
> of the 2.6.32 code, seems like it also had such limits (there was
> a sysctl tunable to turn off hashing, and rt_intern_hash() also had
> other bounds. Were these not enough?

David put together a nice presentation that covers a lot of the topics
in this thread:

  Removing The Linux Routing Cache
  David S. Miller, 2012
  http://vger.kernel.org/~davem/columbia2012.pdf

neal

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 18:26         ` sowmini varadhan
  2014-06-04 18:32           ` Neal Cardwell
@ 2014-06-04 18:44           ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2014-06-04 18:44 UTC (permalink / raw)
  To: sowmini varadhan; +Cc: Suprasad Mutalik Desai, netdev, David Miller

On Wed, 2014-06-04 at 14:26 -0400, sowmini varadhan wrote:
> On Wed, Jun 4, 2014 at 10:59 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Problem with IPv4 route cache is that is was too easy to flood it and
> > get worse performance.
> >
> > It had simply huge memory costs, and non predictable behavior [1]
> 
> Can you elaborate? Caches usually have an upper bound on
> the memory to be consumed by the cache, and from my reading
> of the 2.6.32 code, seems like it also had such limits (there was
> a sysctl tunable to turn off hashing, and rt_intern_hash() also had
> other bounds. Were these not enough?


http://vger.kernel.org/~davem/columbia2012.pdf


For gory details :

git log net/ipv4/route.c

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04  9:04 ` Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario Suprasad Mutalik Desai
  2014-06-04 12:34   ` Eric Dumazet
@ 2014-06-04 19:18   ` David Miller
  2014-06-05  2:17     ` Suprasad Mutalik Desai
  1 sibling, 1 reply; 14+ messages in thread
From: David Miller @ 2014-06-04 19:18 UTC (permalink / raw)
  To: suprasad.desai; +Cc: netdev, davem

From: Suprasad Mutalik Desai <suprasad.desai@gmail.com>
Date: Wed, 4 Jun 2014 14:34:10 +0530

> I guess plain routing scenario was NOT thought through while removing
> the routing cache code.

It is exactly the scenerio that was considered.

The routing cache was susceptible to trivial denial of service
attacks, it therefore had to be removed.

It also scaled poorly with large numbers of active flows.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-04 19:18   ` David Miller
@ 2014-06-05  2:17     ` Suprasad Mutalik Desai
  2014-06-05  6:08       ` David Miller
  2014-06-05  6:32       ` Eric Dumazet
  0 siblings, 2 replies; 14+ messages in thread
From: Suprasad Mutalik Desai @ 2014-06-05  2:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, davem

Hi David,

On Thu, Jun 5, 2014 at 12:48 AM, David Miller <davem@davemloft.net> wrote:
> From: Suprasad Mutalik Desai <suprasad.desai@gmail.com>
> Date: Wed, 4 Jun 2014 14:34:10 +0530
>
>> I guess plain routing scenario was NOT thought through while removing
>> the routing cache code.
>
> It is exactly the scenerio that was considered.
>
> The routing cache was susceptible to trivial denial of service
> attacks, it therefore had to be removed.
>
> It also scaled poorly with large numbers of active flows.

Based on your and Eric's explanation , i clearly understand the
purpose of removing route cache.

But as there was no alternative proposed after removing route cache
mechanism, this is causing a side effect of low throughput in an
embedded router scenario with limited cache and memory resources. With
a limited cache in an embedded router environment checking the routing
table for the dst for every packet (routed scenario) is a heavy
operation which can lead to poor stack  performance.

So, I request you to please suggest on how to address this topic w.r.t
embedded devices ( routers) which has limited memory resources ?

Thanks and Regards,
Suprasad.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-05  2:17     ` Suprasad Mutalik Desai
@ 2014-06-05  6:08       ` David Miller
  2014-06-05  6:32       ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: David Miller @ 2014-06-05  6:08 UTC (permalink / raw)
  To: suprasad.desai; +Cc: netdev, davem

From: Suprasad Mutalik Desai <suprasad.desai@gmail.com>
Date: Thu, 5 Jun 2014 07:47:48 +0530

> So, I request you to please suggest on how to address this topic w.r.t
> embedded devices ( routers) which has limited memory resources ?

You are seeing the downside of the tradeoff we have made in removing
the routing cache.

But we are still OK with the ramifications of this tradeoff, these
are the new performance metrics we have to simply live with.

I also think you are quite unreasonable, considering that the real
problem on your setup is the lack of hardware checksum offloading
_PLUS_ the fact that you insist creating a situation where your
"low powered" embedded machine has to checksum every single packet
going through it.  That's rediculous.

Truthfully, you'd also have excellent througput if you didn't have
netfilter, or at least disabled netfilter conntrack checksumming.

That checksum on every packet significantly dwarfs the overhead caused
by the routing lookups.

Therefore, I think you should concentrate your efforts in that area.

So to make it clear, there is nothing we are going to "address" wrt.
routing lookup cost, because we are perfectly happy with the tradeoff
between exploitability, scalability, and performance that we've made.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
  2014-06-05  2:17     ` Suprasad Mutalik Desai
  2014-06-05  6:08       ` David Miller
@ 2014-06-05  6:32       ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: Eric Dumazet @ 2014-06-05  6:32 UTC (permalink / raw)
  To: Suprasad Mutalik Desai; +Cc: David Miller, netdev

On Thu, 2014-06-05 at 07:47 +0530, Suprasad Mutalik Desai wrote:

> So, I request you to please suggest on how to address this topic w.r.t
> embedded devices ( routers) which has limited memory resources ?

Hmm... How do you catch a cloud and pin it down ?

More seriously...

What about removing this tcp checksum ?

This is the main cost according to your performance data.

Then, you are free to implement a cache of your own where it might help.

Tom Herbert added partial cache on tunnels for example in linux-3.15 or
something like that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-06-05  6:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAJMXqXZQ_S27LCZNJDjcQ9jy2qyJ0UT2nk+wdZOTQep+5rQZhQ@mail.gmail.com>
2014-06-04  9:04 ` Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario Suprasad Mutalik Desai
2014-06-04 12:34   ` Eric Dumazet
2014-06-04 13:53     ` Suprasad Mutalik Desai
2014-06-04 14:59       ` Eric Dumazet
2014-06-04 16:34         ` Eric Dumazet
2014-06-04 18:03           ` Suprasad Mutalik Desai
2014-06-04 17:41         ` Suprasad Mutalik Desai
2014-06-04 18:26         ` sowmini varadhan
2014-06-04 18:32           ` Neal Cardwell
2014-06-04 18:44           ` Eric Dumazet
2014-06-04 19:18   ` David Miller
2014-06-05  2:17     ` Suprasad Mutalik Desai
2014-06-05  6:08       ` David Miller
2014-06-05  6:32       ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.