All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.h.duyck@intel.com>
To: David Miller <davem@davemloft.net>,
	Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev <netdev@vger.kernel.org>
Subject: Performance regression on kernels 3.10 and newer
Date: Thu, 14 Aug 2014 11:19:23 -0700	[thread overview]
Message-ID: <53ECFDAB.5010701@intel.com> (raw)

Yesterday I tripped over a bit of an issue and it seems like we are
seeing significant cache thrash on kernels 3.10 and newer when running
multiple streams of small packet stress on multiple NUMA nodes for a
single NIC.

I did some bisection and found that I was able to trace it back to
upstream commit 093162553c33e9479283e107b4431378271c735d (tcp: force a
dst refcount when prequeue packet).

Recreating this issue is pretty strait forward.  All I did was setup 2
dual socket Xeon systems connected back to back with ixgbe and ran the
following script after disabling tcp_autocork on the transmitting system:
  for i in `seq 0 19`
  do
    for j in `seq 0 2`
    do
      netperf -H 192.168.10.1 -t TCP_STREAM \
              -l 10 -c -C -T $i,$i -P 0 -- \
              -m 64 -s 64K -D
    done
  done

The current net tree as-is will give me about 2Gb/s of data w/ 100% CPU
utilization on the receiving system, and with the patch above reverted
on that system it gives me about 4Gb/s with only 21% CPU utilization.
If I set tcp_low_latency=1 I can get the CPU utilization down to about
12% on the same test with about 4Gb/s of throughput.

I'm still working on determining the exact root cause but it looks to me
like there is some significant cache thrash going on in regards to the
dst entries.

Below is a quick breakdown of the top CPU users for tcp_low_latency
on/off using perf top:

tcp_low_latency = 0
 36.49%  [kernel]                [k] ipv4_dst_check
 19.45%  [kernel]                [k] dst_release
 16.07%  [kernel]                [k] _raw_spin_lock
  9.84%  [kernel]                [k] tcp_prequeue
  2.13%  [kernel]                [k] tcp_v4_rcv
  1.38%  [kernel]                [k] memcpy
  1.04%  [ixgbe]                 [k] ixgbe_clean_rx_irq
  0.82%  [kernel]                [k] ip_rcv_finish
  0.54%  [kernel]                [k] dev_gro_receive
  0.51%  [kernel]                [k] build_skb
  0.51%  [kernel]                [k] __netif_receive_skb_core
  0.50%  [kernel]                [k] tcp_rcv_established
  0.46%  [kernel]                [k] sock_def_readable
  0.42%  [kernel]                [k] __slab_free
  0.38%  [kernel]                [k] __inet_lookup_established
  0.36%  [kernel]                [k] ip_rcv
  0.34%  [kernel]                [k] copy_user_enhanced_fast_string
  0.30%  [kernel]                [k] __netdev_alloc_frag
  0.29%  [kernel]                [k] kmem_cache_alloc
  0.27%  [kernel]                [k] inet_gro_receive
  0.25%  [kernel]                [k] put_compound_page
  0.24%  [kernel]                [k] tcp_v4_do_rcv
  0.24%  [kernel]                [k] napi_gro_receive
  0.22%  [kernel]                [k] tcp_event_data_recv
  0.20%  [kernel]                [k] tcp_gro_receive
  0.17%  [kernel]                [k] tcp_v4_early_demux
  0.16%  [kernel]                [k] kmem_cache_free
  0.14%  [ixgbe]                 [k] ixgbe_poll
  0.14%  [kernel]                [k] eth_type_trans
  0.13%  [kernel]                [k] tcp_prequeue_process
  0.13%  [kernel]                [k] tcp_send_delayed_ack
  0.13%  [kernel]                [k] mod_timer
  0.12%  [kernel]                [k] skb_copy_datagram_iovec
  0.12%  [kernel]                [k] irq_entries_start
  0.12%  [kernel]                [k] inet_ehashfn
  0.12%  [kernel]                [k] __tcp_ack_snd_check
  0.12%  [ixgbe]                 [k] ixgbe_xmit_frame_ring

tcp_low_latency = 1
  7.77%  [kernel]                 [k] memcpy
  6.13%  [ixgbe]                  [k] ixgbe_clean_rx_irq
  3.54%  [kernel]                 [k] skb_try_coalesce
  3.22%  [kernel]                 [k] dev_gro_receive
  3.21%  [kernel]                 [k] tcp_v4_rcv
  2.91%  [kernel]                 [k] __netif_receive_skb_core
  2.64%  [kernel]                 [k] build_skb
  2.59%  [kernel]                 [k] acpi_processor_ffh_cstate_enter
  2.53%  [kernel]                 [k] sock_def_readable
  2.26%  [kernel]                 [k] _raw_spin_lock
  2.20%  [kernel]                 [k] tcp_rcv_established
  2.07%  [kernel]                 [k] __inet_lookup_established
  1.95%  [kernel]                 [k] ip_rcv
  1.82%  [kernel]                 [k] kmem_cache_free
  1.76%  [kernel]                 [k] copy_user_enhanced_fast_string
  1.56%  [kernel]                 [k] tcp_try_coalesce
  1.53%  [kernel]                 [k] __netdev_alloc_frag
  1.53%  [kernel]                 [k] inet_gro_receive
  1.51%  [kernel]                 [k] napi_gro_receive
  1.29%  [kernel]                 [k] kmem_cache_alloc
  1.18%  [kernel]                 [k] tcp_gro_receive
  1.09%  [kernel]                 [k] put_compound_page
  0.98%  [kernel]                 [k] ip_local_deliver_finish
  0.97%  [kernel]                 [k] tcp_send_delayed_ack
  0.95%  [kernel]                 [k] tcp_event_data_recv
  0.90%  [kernel]                 [k] inet_ehashfn
  0.88%  [kernel]                 [k] ip_rcv_finish
  0.78%  [kernel]                 [k] tcp_v4_do_rcv
  0.77%  [kernel]                 [k] tcp_v4_early_demux
  0.76%  [kernel]                 [k] __switch_to
  0.76%  [kernel]                 [k] eth_type_trans
  0.75%  [kernel]                 [k] tcp_queue_rcv
  0.74%  [kernel]                 [k] __schedule
  0.72%  [kernel]                 [k] skb_copy_datagram_iovec
  0.71%  [ixgbe]                  [k] ixgbe_xmit_frame_ring
  0.68%  [kernel]                 [k] __tcp_ack_snd_check
  0.68%  [ixgbe]                  [k] ixgbe_poll
  0.67%  [kernel]                 [k] mod_timer
  0.64%  [kernel]                 [k] lapic_next_deadline

Any input/advice on where I should look or patches to possibly test
would be appreciated.

Thanks,

Alex

             reply	other threads:[~2014-08-14 18:19 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-14 18:19 Alexander Duyck [this message]
2014-08-14 18:46 ` Performance regression on kernels 3.10 and newer Eric Dumazet
2014-08-14 19:50   ` Eric Dumazet
2014-08-14 19:59   ` Rick Jones
2014-08-14 20:31     ` Alexander Duyck
2014-08-14 20:51       ` Eric Dumazet
2014-08-14 20:46     ` Eric Dumazet
2014-08-14 23:16   ` Alexander Duyck
2014-08-14 23:20     ` David Miller
2014-08-14 23:25       ` Tom Herbert
2014-08-21 23:24         ` David Miller
2014-09-06 14:45           ` Eric Dumazet
2014-09-06 15:27             ` Eric Dumazet
2014-09-06 15:46               ` Eric Dumazet
2014-09-06 16:38                 ` Eric Dumazet
2014-09-06 18:21                   ` Eric Dumazet
2014-09-07 19:05                     ` [PATCH net] ipv6: refresh rt6i_genid in ip6_pol_route() Eric Dumazet
2014-09-07 22:54                       ` David Miller
2014-09-08  4:18                         ` Eric Dumazet
2014-09-08  4:27                           ` David Miller
2014-09-08  4:43                             ` Eric Dumazet
2014-09-08  4:59                               ` David Miller
2014-09-08  5:07                                 ` Eric Dumazet
2014-09-08  8:11                                   ` Nicolas Dichtel
2014-09-08 10:28                                     ` Eric Dumazet
2014-09-08 12:16                                       ` Nicolas Dichtel
2014-09-08 18:48                                   ` Vlad Yasevich
2014-09-09 12:58                                   ` Hannes Frederic Sowa
2014-09-10  9:31                                     ` [PATCH net-next] ipv6: implement rt_genid_bump_ipv6 with fn_sernum and remove rt6i_genid Hannes Frederic Sowa
2014-09-10 13:26                                       ` Vlad Yasevich
2014-09-10 13:42                                         ` Hannes Frederic Sowa
2014-09-10 20:09                                       ` David Miller
2014-09-11  8:30                                         ` Hannes Frederic Sowa
2014-09-11 12:22                                           ` Vlad Yasevich
2014-09-11 12:40                                             ` Hannes Frederic Sowa
2014-09-11 12:05                                         ` Hannes Frederic Sowa
2014-09-11 14:19                                           ` Vlad Yasevich
2014-09-11 14:32                                             ` Hannes Frederic Sowa
2014-09-11 14:44                                               ` Vlad Yasevich
2014-09-11 14:47                                                 ` Hannes Frederic Sowa
2014-09-08 15:06               ` [PATCH v2 net-next] tcp: remove dst refcount false sharing for prequeue mode Eric Dumazet
2014-09-08 21:21                 ` David Miller
2014-09-08 21:30                   ` Eric Dumazet
2014-09-08 22:41                     ` David Miller
2014-09-09 23:56                     ` David Miller
2014-08-15 17:15       ` Performance regression on kernels 3.10 and newer Alexander Duyck
2014-08-15 17:59         ` Eric Dumazet
2014-08-15 18:49         ` Tom Herbert
2014-08-15 19:10           ` Alexander Duyck
2014-08-15 22:16             ` Tom Herbert
2014-08-15 23:23               ` Alexander Duyck
2014-08-18  9:03                 ` David Laight
2014-08-18 15:22                   ` Alexander Duyck
2014-08-18 15:29                     ` Rick Jones
2014-08-21 23:51         ` David Miller
2014-08-14 23:48     ` Eric Dumazet
2014-08-15  0:33       ` Rick Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53ECFDAB.5010701@intel.com \
    --to=alexander.h.duyck@intel.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.