netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Bad performance on modified pktgen in 4.0 vs 3.17 kernel.
@ 2015-04-29 23:39 Ben Greear
  2015-05-08  4:11 ` Ben Greear
  0 siblings, 1 reply; 2+ messages in thread
From: Ben Greear @ 2015-04-29 23:39 UTC (permalink / raw)
  To: netdev

We run a hacked version of pktgen, it has some pkt-rx logic, and probably spends more time
grabbing timestamps than stock code.  It also should not be doing any busy-spins for sleeping.

You can see pktgen changes, supporting patches, and various other stuff here:

http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-4.0.dev.y/.git;a=summary
git clone git://dmz2.candelatech.com/linux-4.0.dev.y


On a 64-bit atom system, with e1000 driver, we see around 50% cpu usage
when running 40,000 pkts per second on two interfaces on the 3.17.8+ kernel.

# cat perf-top-3-17.txt
   PerfTop:    3682 irqs/sec  kernel:78.7%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

     3.43%  [kernel]       [k] pktgen_thread_worker
     2.47%  libc-2.20.so   [.] __strstr_sse2
     2.31%  [kernel]       [k] e1000_xmit_frame
     2.25%  [kernel]       [k] number.isra.1
     2.18%  [kernel]       [k] vsnprintf
     1.96%  libc-2.20.so   [.] __GI___strcmp_ssse3
     1.84%  [kernel]       [k] format_decode
     1.80%  [kernel]       [k] build_skb
     1.79%  [kernel]       [k] kallsyms_expand_symbol.constprop.1
     1.76%  [kernel]       [k] native_read_tsc
     1.74%  perf           [.] rb_next
     1.57%  [kernel]       [k] getRelativeCurNs
     1.48%  perf           [.] symbols__insert
     1.10%  perf           [.] hex2u64
     1.07%  [kernel]       [k] e1000_irq_enable
     1.06%  [kernel]       [k] timekeeping_get_ns
     1.03%  [kernel]       [k] e1000_clean_rx_irq
     1.00%  [kernel]       [k] __getnstimeofday64
     0.97%  [kernel]       [k] string.isra.6
     0.97%  [kernel]       [k] do_raw_spin_lock
     0.97%  [kernel]       [k] kmem_cache_alloc
     0.94%  [kernel]       [k] e1000_intr_msi


On 4.0, there is significantly more CPU usage.  I tried copying the pktgen.c from 3.17 to 4.0
and that did not have any noticeable affect, so I think it must be something outside of my changes.

# cat perf-top-40.txt
   PerfTop:    4566 irqs/sec  kernel:87.4%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

    20.72%  [kernel]       [k] mwait_idle_with_hints.constprop.2
    10.98%  [kernel]       [k] __lock_acquire
     3.30%  [kernel]       [k] pktgen_thread_worker
     2.41%  [kernel]       [k] arch_local_save_flags
     2.25%  [kernel]       [k] e1000_xmit_frame
     1.83%  [kernel]       [k] lock_release
     1.57%  [kernel]       [k] lock_acquire
     1.54%  [kernel]       [k] trace_hardirqs_on_caller
     1.50%  libc-2.20.so   [.] __strstr_sse2
     1.41%  [kernel]       [k] number.isra.1
     1.22%  [kernel]       [k] trace_hardirqs_off_caller
     1.20%  [kernel]       [k] kallsyms_expand_symbol.constprop.1
     1.19%  [kernel]       [k] build_skb
     1.18%  [kernel]       [k] format_decode
     1.17%  [kernel]       [k] hlock_class
     1.17%  [kernel]       [k] arch_local_irq_restore
     1.09%  [kernel]       [k] vsnprintf
     1.00%  [kernel]       [k] arch_local_irq_save
     0.97%  libc-2.20.so   [.] __GI___strcmp_ssse3
     0.97%  [kernel]       [k] mark_held_locks
     0.89%  [kernel]       [k] mark_lock


We see similar jump in CPU usage in the 4.0 kernel when using the 40G Intel NIC/driver
on an E5 system, so it is probably not just something to do with the driver.

Due to hooks in the pkt rx logic (and changes to the stock kernel code in that area between
3.17 and 4.), this will not be trivial to do an automated bisect, so I'm hoping to not
have to do that...

I'm curious if anyone has seen any similar performance degradation, and whether there
are any ideas what might be the problem.

Thanks,
Ben



-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Bad performance on modified pktgen in 4.0 vs 3.17 kernel.
  2015-04-29 23:39 Bad performance on modified pktgen in 4.0 vs 3.17 kernel Ben Greear
@ 2015-05-08  4:11 ` Ben Greear
  0 siblings, 0 replies; 2+ messages in thread
From: Ben Greear @ 2015-05-08  4:11 UTC (permalink / raw)
  To: netdev

My problem was self inflicted:  I had lockdep and related things enabled.

Runs fine w/out that extra debug in there.

THanks,
Ben


On 04/29/2015 04:39 PM, Ben Greear wrote:
> We run a hacked version of pktgen, it has some pkt-rx logic, and probably spends more time
> grabbing timestamps than stock code.  It also should not be doing any busy-spins for sleeping.
>
> You can see pktgen changes, supporting patches, and various other stuff here:
>
> http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-4.0.dev.y/.git;a=summary
> git clone git://dmz2.candelatech.com/linux-4.0.dev.y
>
>
> On a 64-bit atom system, with e1000 driver, we see around 50% cpu usage
> when running 40,000 pkts per second on two interfaces on the 3.17.8+ kernel.
>
> # cat perf-top-3-17.txt
>     PerfTop:    3682 irqs/sec  kernel:78.7%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>       3.43%  [kernel]       [k] pktgen_thread_worker
>       2.47%  libc-2.20.so   [.] __strstr_sse2
>       2.31%  [kernel]       [k] e1000_xmit_frame
>       2.25%  [kernel]       [k] number.isra.1
>       2.18%  [kernel]       [k] vsnprintf
>       1.96%  libc-2.20.so   [.] __GI___strcmp_ssse3
>       1.84%  [kernel]       [k] format_decode
>       1.80%  [kernel]       [k] build_skb
>       1.79%  [kernel]       [k] kallsyms_expand_symbol.constprop.1
>       1.76%  [kernel]       [k] native_read_tsc
>       1.74%  perf           [.] rb_next
>       1.57%  [kernel]       [k] getRelativeCurNs
>       1.48%  perf           [.] symbols__insert
>       1.10%  perf           [.] hex2u64
>       1.07%  [kernel]       [k] e1000_irq_enable
>       1.06%  [kernel]       [k] timekeeping_get_ns
>       1.03%  [kernel]       [k] e1000_clean_rx_irq
>       1.00%  [kernel]       [k] __getnstimeofday64
>       0.97%  [kernel]       [k] string.isra.6
>       0.97%  [kernel]       [k] do_raw_spin_lock
>       0.97%  [kernel]       [k] kmem_cache_alloc
>       0.94%  [kernel]       [k] e1000_intr_msi
>
>
> On 4.0, there is significantly more CPU usage.  I tried copying the pktgen.c from 3.17 to 4.0
> and that did not have any noticeable affect, so I think it must be something outside of my changes.
>
> # cat perf-top-40.txt
>     PerfTop:    4566 irqs/sec  kernel:87.4%  exact:  0.0% [4000Hz cycles],  (all, 4 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>      20.72%  [kernel]       [k] mwait_idle_with_hints.constprop.2
>      10.98%  [kernel]       [k] __lock_acquire
>       3.30%  [kernel]       [k] pktgen_thread_worker
>       2.41%  [kernel]       [k] arch_local_save_flags
>       2.25%  [kernel]       [k] e1000_xmit_frame
>       1.83%  [kernel]       [k] lock_release
>       1.57%  [kernel]       [k] lock_acquire
>       1.54%  [kernel]       [k] trace_hardirqs_on_caller
>       1.50%  libc-2.20.so   [.] __strstr_sse2
>       1.41%  [kernel]       [k] number.isra.1
>       1.22%  [kernel]       [k] trace_hardirqs_off_caller
>       1.20%  [kernel]       [k] kallsyms_expand_symbol.constprop.1
>       1.19%  [kernel]       [k] build_skb
>       1.18%  [kernel]       [k] format_decode
>       1.17%  [kernel]       [k] hlock_class
>       1.17%  [kernel]       [k] arch_local_irq_restore
>       1.09%  [kernel]       [k] vsnprintf
>       1.00%  [kernel]       [k] arch_local_irq_save
>       0.97%  libc-2.20.so   [.] __GI___strcmp_ssse3
>       0.97%  [kernel]       [k] mark_held_locks
>       0.89%  [kernel]       [k] mark_lock
>
>
> We see similar jump in CPU usage in the 4.0 kernel when using the 40G Intel NIC/driver
> on an E5 system, so it is probably not just something to do with the driver.
>
> Due to hooks in the pkt rx logic (and changes to the stock kernel code in that area between
> 3.17 and 4.), this will not be trivial to do an automated bisect, so I'm hoping to not
> have to do that...
>
> I'm curious if anyone has seen any similar performance degradation, and whether there
> are any ideas what might be the problem.
>
> Thanks,
> Ben
>
>
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-05-08  4:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-29 23:39 Bad performance on modified pktgen in 4.0 vs 3.17 kernel Ben Greear
2015-05-08  4:11 ` Ben Greear

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).