All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suprasad Mutalik Desai <suprasad.desai@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, davem@davemloft.ne
Subject: Re: Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario
Date: Wed, 4 Jun 2014 19:23:55 +0530	[thread overview]
Message-ID: <CAJMXqXaqXG03e+OHr0aMh9x_N4BzBVWTkQLKgNVvq-QmDaX5Ng@mail.gmail.com> (raw)
In-Reply-To: <1401885281.3645.245.camel@edumazet-glaptop2.roam.corp.google.com>

Hi Eric,

             Thanks for your inputs. Please find my comments inline.

On Wed, Jun 4, 2014 at 6:04 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2014-06-04 at 14:34 +0530, Suprasad Mutalik Desai wrote:
>> Hi,
>>
>>
>>     Currently i am working on 3.10.12 kernel and it seems the Linux
>> stack performance (TCP and UDP) has degraded drastically as compared
>> to 2.6 kernel.
>>
>> Results :
>>
>> Linux 2.6.32
>> ---------------------
>> TCP traffic using iperf
>>     - Upstream : 140 Mbps
>>     - Downstream : 148 Mbps
>>
>> UDP traffic using iperf
>>     - Upstream : 200 Mbps
>>     - Downstream : 245 Mbps
>>
>> Linux 3.10.12
>> --------------------
>> TCP traffic using iperf
>>     - Upstream : 101 Mbps
>>     - Downstream : 106 Mbps
>>
>> UDP traffic using iperf
>>     - Upstream : 140 Mbps
>>     - Downstream : 170 Mbps
>>
>> Analysis:
>> ---------------
>> 1.   As per profiling data on Linux-3.10.12 it seems,
>>              -   fib_table_lookup and ip_route_input_noref is being
>> called most of the times and thus causing the degradation in
>> performance.
>>
>>     8.77    csum_partial 0x80009A20 1404
>
> Main problem here is lack of checksums. What kind of NIC is used ?

I missed out explaining my setup in the previous mail, We use an
embedded router platform running at 600Mhz CPU speed. The NICs don't
have Checksum offload function (as you pointed out), but what I am
trying to analyze is relative performance drop on our router in 3.10
kernel vs 2.6.32/3.4 kernels. For this test, I am sending TCP traffic
using iperf from LAN Ethernet port to WAN Ethernet port, with routing
done by Linux kernel.

>
>>     4.53    ipt_do_table 0x80365C34 1352
>>     3.45    eth_xmit 0x870D0C88 5460
>>     3.41    fib_table_lookup 0x8035240C 856    <----------
>>     3.38    __netif_receive_skb_core 0x802B5C00 2276
>>     3.07    dma_device_write 0x80013BD4 752
>>     2.94    nf_iterate 0x802EA380 256
>>     2.69    ip_route_input_noref 0x8030CE14 2520    <--------------
>>     2.24    ip_forward 0x8031108C 1040
>>     2.04    tcp_packet 0x802F45BC 3956
>>     1.93    nf_conntrack_in 0x802EEAF4 2284
>>
>> 2.    Based on the above observation, when searched,  it seems Routing
>> cache code has been removed from Linux-3.6 kernel and thus every
>> packet has to go through ip_route_input_noref to find the destination.
>>
>> 3.    Related to this, a patch from David Miller adds "ipv4: Early TCP
>> socket demux" which caches the "dst per socket" and maintains
>> tcp_hashinfo and uses early_demux(skb) (TCP --> tcp_v4_early_demux and
>> UDP --> NULL i.e not defined) to get the "dst" of that skb and thus
>> avoids ip_route_input_noref being called everytime.
>>           -  But this still doesn’t handle routing scenarios (LAN <-->  WAN).
>>
>> 4.    A patch for UDP early demux has been added in Linux 3.13 and
>> certain bugfixes has gone in Linux-3.14 .
>>
>> 5.    As we are based on 3.10 thus no UDP early_demux support . This
>> means we have to backport the UDP early demux patch to 3.10 kernel .
>
> Nope : This will be of no use on a router. It even will slow down the
> router.
>

So i gather, early_demux code is applicable to the traffic which is
either originating or terminating on the device basically Host
scenario and it is not relevant in routed scenarios.

>>
>>
>> Issue :
>> -----------
>>
>> 1.    The implementation of "Early TCP socket demux" doesn't address
>> the routing scenario (LAN <---> WAN) . This means TCP and UDP routing
>> performance will be less in 3.10 kernel and also in 3.14 kernel as
>> every packet has to go through route lookup.
>>
>>
>> Is there an alternative to get back the Linux stack performance of 2.6
>> or 3.4 kernel where we have the route cache ?
>>
>> I guess plain routing scenario was NOT thought through while removing
>> the routing cache code.
>
> This is the opposite. Route cache was easily targeted by DDOS attacks.
>
> This was a nightmare.
>
>
>
I understood from you that the old route cache mechanism had DoS
vulnerabilities, so the new mechanism is implemented. What I am trying
to figure out is whether that will cause the kind of throughput drop
that I am seeing ?

TCP performance
            - Upstream : 140 Mbps(Linux 2.6.32) --> 101Mbps (Linux 3.10.12)
            - Downstream : 148 Mbps(Linux 2.6.32) --> 106Mbps (Linux 3.10.12)

Thanks and Regards,
Suprasad.

  reply	other threads:[~2014-06-04 13:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJMXqXZQ_S27LCZNJDjcQ9jy2qyJ0UT2nk+wdZOTQep+5rQZhQ@mail.gmail.com>
2014-06-04  9:04 ` Fwd: Linux stack performance drop (TCP and UDP) in 3.10 kernel in routed scenario Suprasad Mutalik Desai
2014-06-04 12:34   ` Eric Dumazet
2014-06-04 13:53     ` Suprasad Mutalik Desai [this message]
2014-06-04 14:59       ` Eric Dumazet
2014-06-04 16:34         ` Eric Dumazet
2014-06-04 18:03           ` Suprasad Mutalik Desai
2014-06-04 17:41         ` Suprasad Mutalik Desai
2014-06-04 18:26         ` sowmini varadhan
2014-06-04 18:32           ` Neal Cardwell
2014-06-04 18:44           ` Eric Dumazet
2014-06-04 19:18   ` David Miller
2014-06-05  2:17     ` Suprasad Mutalik Desai
2014-06-05  6:08       ` David Miller
2014-06-05  6:32       ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJMXqXaqXG03e+OHr0aMh9x_N4BzBVWTkQLKgNVvq-QmDaX5Ng@mail.gmail.com \
    --to=suprasad.desai@gmail.com \
    --cc=davem@davemloft.ne \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.