* Interpreting perf stat on netperf and netserver
@ 2012-01-18 11:33 Jean-Michel Hautbois
2012-01-18 17:49 ` Rick Jones
2012-01-19 9:24 ` Eric Dumazet
0 siblings, 2 replies; 7+ messages in thread
From: Jean-Michel Hautbois @ 2012-01-18 11:33 UTC (permalink / raw)
To: netdev
Hi all,
I am currently using netperf/netserver in order to characterize a
benet emulex network device on a machine with 2 Xeon5670.
I am using the latest linux kernel from git (3.2.0+).
I am facing several issues, and I am trying to understand the
following perf stat launched on netserver :
Performance counter stats for process id '5043':
15452.992135 task-clock # 0.450 CPUs utilized
189678 context-switches # 0.012 M/sec
5 CPU-migrations # 0.000 M/sec
275 page-faults # 0.000 M/sec
48490467936 cycles # 3.138 GHz
33005879963 stalled-cycles-frontend # 68.07% frontend cycles idle
16325855769 stalled-cycles-backend # 33.67% backend cycles idle
27340520316 instructions # 0.56 insns per cycle
# 1.21 stalled cycles per insn
4745604818 branches # 307.099 M/sec
67513124 branch-misses # 1.42% of all branches
34.303567279 seconds time elapsed
I am trying to understand the "stalled-cycles-frontend" and
"stalled-cycles-backend" lines.
It seems that frontend is high, and in red :) but I can't say why...
The be2net driver seems to have difficulties woth IRQ affinity also,
because it always uses CPU0 even if the affinity is 0-23 !
netperf result is quite good, and perf top shows :
PerfTop: 689 irqs/sec kernel:99.6% us: 0.3% guest kernel: 0.0%
guest us: 0.0% exact: 0.0% [1000Hz cycles], (all, 24 CPUs)
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
29.28% [kernel] [k] csum_partial
8.46% [kernel] [k] copy_user_generic_string
3.70% [be2net] [k] be_poll_rx
3.08% [be2net] [k] event_handle
2.37% [kernel] [k] irq_entries_start
2.21% [be2net] [k] be_rx_compl_get
1.65% [be2net] [k] be_post_rx_frags
1.64% [kernel] [k] __napi_complete
1.50% [kernel] [k] ip_defrag
1.35% [kernel] [k] put_page
1.34% [kernel] [k] get_page_from_freelist
1.29% [kernel] [k] __netif_receive_skb
1.16% [kernel] [k] __alloc_pages_nodemask
1.14% [kernel] [k] debug_smp_processor_id
1.08% [kernel] [k] add_preempt_count
1.06% [kernel] [k] sub_preempt_count
1.03% [be2net] [k] get_rx_page_info
1.01% [kernel] [k] alloc_pages_current
Checksum calculation seems quite complex :).
Regards,
JM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interpreting perf stat on netperf and netserver
2012-01-18 11:33 Interpreting perf stat on netperf and netserver Jean-Michel Hautbois
@ 2012-01-18 17:49 ` Rick Jones
2012-01-19 8:29 ` Jean-Michel Hautbois
2012-01-19 9:24 ` Eric Dumazet
1 sibling, 1 reply; 7+ messages in thread
From: Rick Jones @ 2012-01-18 17:49 UTC (permalink / raw)
To: Jean-Michel Hautbois; +Cc: netdev
On 01/18/2012 03:33 AM, Jean-Michel Hautbois wrote:
> Hi all,
>
> I am currently using netperf/netserver in order to characterize a
> benet emulex network device on a machine with 2 Xeon5670.
> I am using the latest linux kernel from git (3.2.0+).
> I am facing several issues, and I am trying to understand the
> following perf stat launched on netserver :
>
> Performance counter stats for process id '5043':
If you aren't already you may want to gather system-wide data as well -
not everything networking is guaranteed to run in the netserver's (or
netperf's) context.
Might also be good to include the netperf command line driving that
netserver. That will help folks know if the netserver is receiving data
(_STREAM), sending data (_MAERTS) or both (_RR) (though perhaps that can
be gleaned from the routine names in the profile.
>
> 15452.992135 task-clock # 0.450 CPUs utilized
> 189678 context-switches # 0.012 M/sec
> 5 CPU-migrations # 0.000 M/sec
> 275 page-faults # 0.000 M/sec
> 48490467936 cycles # 3.138 GHz
> 33005879963 stalled-cycles-frontend # 68.07% frontend cycles idle
> 16325855769 stalled-cycles-backend # 33.67% backend cycles idle
> 27340520316 instructions # 0.56 insns per cycle
> # 1.21 stalled cycles per insn
> 4745604818 branches # 307.099 M/sec
> 67513124 branch-misses # 1.42% of all branches
>
> 34.303567279 seconds time elapsed
>
> I am trying to understand the "stalled-cycles-frontend" and
> "stalled-cycles-backend" lines.
> It seems that frontend is high, and in red :) but I can't say why...
Perhaps the stalls are for cache misses - at least cache misses are a
common reason for stalls. I believe that perf has a way to be more
specific about the PMU events of interest.
>
> The be2net driver seems to have difficulties woth IRQ affinity also,
> because it always uses CPU0 even if the affinity is 0-23 !
> netperf result is quite good, and perf top shows :
> PerfTop: 689 irqs/sec kernel:99.6% us: 0.3% guest kernel: 0.0%
> guest us: 0.0% exact: 0.0% [1000Hz cycles], (all, 24 CPUs)
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 29.28% [kernel] [k] csum_partial
> 8.46% [kernel] [k] copy_user_generic_string
> 3.70% [be2net] [k] be_poll_rx
> 3.08% [be2net] [k] event_handle
> 2.37% [kernel] [k] irq_entries_start
> 2.21% [be2net] [k] be_rx_compl_get
> 1.65% [be2net] [k] be_post_rx_frags
> 1.64% [kernel] [k] __napi_complete
> 1.50% [kernel] [k] ip_defrag
> 1.35% [kernel] [k] put_page
> 1.34% [kernel] [k] get_page_from_freelist
> 1.29% [kernel] [k] __netif_receive_skb
> 1.16% [kernel] [k] __alloc_pages_nodemask
> 1.14% [kernel] [k] debug_smp_processor_id
> 1.08% [kernel] [k] add_preempt_count
> 1.06% [kernel] [k] sub_preempt_count
> 1.03% [be2net] [k] get_rx_page_info
> 1.01% [kernel] [k] alloc_pages_current
>
> Checksum calculation seems quite complex :).
> Regards,
> JM
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interpreting perf stat on netperf and netserver
2012-01-18 17:49 ` Rick Jones
@ 2012-01-19 8:29 ` Jean-Michel Hautbois
2012-01-19 17:58 ` Rick Jones
0 siblings, 1 reply; 7+ messages in thread
From: Jean-Michel Hautbois @ 2012-01-19 8:29 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev
2012/1/18 Rick Jones <rick.jones2@hp.com>:
> On 01/18/2012 03:33 AM, Jean-Michel Hautbois wrote:
>>
>> Hi all,
>>
>> I am currently using netperf/netserver in order to characterize a
>> benet emulex network device on a machine with 2 Xeon5670.
>> I am using the latest linux kernel from git (3.2.0+).
>> I am facing several issues, and I am trying to understand the
>> following perf stat launched on netserver :
>>
>> Performance counter stats for process id '5043':
>
>
> If you aren't already you may want to gather system-wide data as well - not
> everything networking is guaranteed to run in the netserver's (or netperf's)
> context.
>
> Might also be good to include the netperf command line driving that
> netserver. That will help folks know if the netserver is receiving data
> (_STREAM), sending data (_MAERTS) or both (_RR) (though perhaps that can be
> gleaned from the routine names in the profile.
Well, I am only launching netserver without any parameter.
>
>>
>> 15452.992135 task-clock # 0.450 CPUs utilized
>> 189678 context-switches # 0.012 M/sec
>> 5 CPU-migrations # 0.000 M/sec
>> 275 page-faults # 0.000 M/sec
>> 48490467936 cycles # 3.138 GHz
>> 33005879963 stalled-cycles-frontend # 68.07% frontend cycles
>> idle
>> 16325855769 stalled-cycles-backend # 33.67% backend cycles
>> idle
>> 27340520316 instructions # 0.56 insns per cycle
>> # 1.21 stalled cycles per
>> insn
>> 4745604818 branches # 307.099 M/sec
>> 67513124 branch-misses # 1.42% of all branches
>>
>> 34.303567279 seconds time elapsed
>>
>> I am trying to understand the "stalled-cycles-frontend" and
>> "stalled-cycles-backend" lines.
>> It seems that frontend is high, and in red :) but I can't say why...
>
>
> Perhaps the stalls are for cache misses - at least cache misses are a common
> reason for stalls. I believe that perf has a way to be more specific about
> the PMU events of interest.
Yes, there is some events for that :). But I didn't know if stall was
cache misses related or not :).
JM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interpreting perf stat on netperf and netserver
2012-01-18 11:33 Interpreting perf stat on netperf and netserver Jean-Michel Hautbois
2012-01-18 17:49 ` Rick Jones
@ 2012-01-19 9:24 ` Eric Dumazet
2012-01-19 10:26 ` Jean-Michel Hautbois
1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2012-01-19 9:24 UTC (permalink / raw)
To: Jean-Michel Hautbois; +Cc: netdev
Le mercredi 18 janvier 2012 à 12:33 +0100, Jean-Michel Hautbois a
écrit :
> The be2net driver seems to have difficulties woth IRQ affinity also,
> because it always uses CPU0 even if the affinity is 0-23 !
> netperf result is quite good, and perf top shows :
> PerfTop: 689 irqs/sec kernel:99.6% us: 0.3% guest kernel: 0.0%
> guest us: 0.0% exact: 0.0% [1000Hz cycles], (all, 24 CPUs)
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> 29.28% [kernel] [k] csum_partial
> 8.46% [kernel] [k] copy_user_generic_string
> 3.70% [be2net] [k] be_poll_rx
> 3.08% [be2net] [k] event_handle
> 2.37% [kernel] [k] irq_entries_start
> 2.21% [be2net] [k] be_rx_compl_get
> 1.65% [be2net] [k] be_post_rx_frags
> 1.64% [kernel] [k] __napi_complete
> 1.50% [kernel] [k] ip_defrag
> 1.35% [kernel] [k] put_page
> 1.34% [kernel] [k] get_page_from_freelist
> 1.29% [kernel] [k] __netif_receive_skb
> 1.16% [kernel] [k] __alloc_pages_nodemask
> 1.14% [kernel] [k] debug_smp_processor_id
> 1.08% [kernel] [k] add_preempt_count
> 1.06% [kernel] [k] sub_preempt_count
> 1.03% [be2net] [k] get_rx_page_info
> 1.01% [kernel] [k] alloc_pages_current
>
> Checksum calculation seems quite complex :).
UDP with fragments... I guess... but cant see this in the perf top
output ;)
What bandwidth do you get for this cpu load ?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interpreting perf stat on netperf and netserver
2012-01-19 9:24 ` Eric Dumazet
@ 2012-01-19 10:26 ` Jean-Michel Hautbois
0 siblings, 0 replies; 7+ messages in thread
From: Jean-Michel Hautbois @ 2012-01-19 10:26 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev
2012/1/19 Eric Dumazet <eric.dumazet@gmail.com>:
> Le mercredi 18 janvier 2012 à 12:33 +0100, Jean-Michel Hautbois a
> écrit :
>
>> The be2net driver seems to have difficulties woth IRQ affinity also,
>> because it always uses CPU0 even if the affinity is 0-23 !
>> netperf result is quite good, and perf top shows :
>> PerfTop: 689 irqs/sec kernel:99.6% us: 0.3% guest kernel: 0.0%
>> guest us: 0.0% exact: 0.0% [1000Hz cycles], (all, 24 CPUs)
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> 29.28% [kernel] [k] csum_partial
>> 8.46% [kernel] [k] copy_user_generic_string
>> 3.70% [be2net] [k] be_poll_rx
>> 3.08% [be2net] [k] event_handle
>> 2.37% [kernel] [k] irq_entries_start
>> 2.21% [be2net] [k] be_rx_compl_get
>> 1.65% [be2net] [k] be_post_rx_frags
>> 1.64% [kernel] [k] __napi_complete
>> 1.50% [kernel] [k] ip_defrag
>> 1.35% [kernel] [k] put_page
>> 1.34% [kernel] [k] get_page_from_freelist
>> 1.29% [kernel] [k] __netif_receive_skb
>> 1.16% [kernel] [k] __alloc_pages_nodemask
>> 1.14% [kernel] [k] debug_smp_processor_id
>> 1.08% [kernel] [k] add_preempt_count
>> 1.06% [kernel] [k] sub_preempt_count
>> 1.03% [be2net] [k] get_rx_page_info
>> 1.01% [kernel] [k] alloc_pages_current
>>
>> Checksum calculation seems quite complex :).
>
> UDP with fragments... I guess... but cant see this in the perf top
> output ;)
>
> What bandwidth do you get for this cpu load ?
Yes, this was UDP fragmentation :). I had 6800Mbps for a theoretical
limit of 7000Mbps (Flex 10Gbps system).
JM
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interpreting perf stat on netperf and netserver
2012-01-19 8:29 ` Jean-Michel Hautbois
@ 2012-01-19 17:58 ` Rick Jones
2012-01-20 7:20 ` Jean-Michel Hautbois
0 siblings, 1 reply; 7+ messages in thread
From: Rick Jones @ 2012-01-19 17:58 UTC (permalink / raw)
To: Jean-Michel Hautbois; +Cc: netdev
On 01/19/2012 12:29 AM, Jean-Michel Hautbois wrote:
> 2012/1/18 Rick Jones<rick.jones2@hp.com>:
>> On 01/18/2012 03:33 AM, Jean-Michel Hautbois wrote:
>>>
>>> Hi all,
>>>
>>> I am currently using netperf/netserver in order to characterize a
>>> benet emulex network device on a machine with 2 Xeon5670.
>>> I am using the latest linux kernel from git (3.2.0+).
>>> I am facing several issues, and I am trying to understand the
>>> following perf stat launched on netserver :
>>>
>>> Performance counter stats for process id '5043':
>>
>>
>> If you aren't already you may want to gather system-wide data as well - not
>> everything networking is guaranteed to run in the netserver's (or netperf's)
>> context.
>>
>> Might also be good to include the netperf command line driving that
>> netserver. That will help folks know if the netserver is receiving data
>> (_STREAM), sending data (_MAERTS) or both (_RR) (though perhaps that can be
>> gleaned from the routine names in the profile.
>
> Well, I am only launching netserver without any parameter.
The netperf command line, not netserver :) The netperf command line
will tell us what the netserver was asked to and was presumably doing at
the time the profile was taken.
happy benchmarking,
rick
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Interpreting perf stat on netperf and netserver
2012-01-19 17:58 ` Rick Jones
@ 2012-01-20 7:20 ` Jean-Michel Hautbois
0 siblings, 0 replies; 7+ messages in thread
From: Jean-Michel Hautbois @ 2012-01-20 7:20 UTC (permalink / raw)
To: Rick Jones; +Cc: netdev
2012/1/19 Rick Jones <rick.jones2@hp.com>:
> On 01/19/2012 12:29 AM, Jean-Michel Hautbois wrote:
>>
>> 2012/1/18 Rick Jones<rick.jones2@hp.com>:
>>>
>>> On 01/18/2012 03:33 AM, Jean-Michel Hautbois wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I am currently using netperf/netserver in order to characterize a
>>>> benet emulex network device on a machine with 2 Xeon5670.
>>>> I am using the latest linux kernel from git (3.2.0+).
>>>> I am facing several issues, and I am trying to understand the
>>>> following perf stat launched on netserver :
>>>>
>>>> Performance counter stats for process id '5043':
>>>
>>>
>>>
>>> If you aren't already you may want to gather system-wide data as well -
>>> not
>>> everything networking is guaranteed to run in the netserver's (or
>>> netperf's)
>>> context.
>>>
>>> Might also be good to include the netperf command line driving that
>>> netserver. That will help folks know if the netserver is receiving data
>>> (_STREAM), sending data (_MAERTS) or both (_RR) (though perhaps that can
>>> be
>>> gleaned from the routine names in the profile.
>>
>>
>> Well, I am only launching netserver without any parameter.
>
>
> The netperf command line, not netserver :) The netperf command line will
> tell us what the netserver was asked to and was presumably doing at the time
> the profile was taken.
I launched it using netperf -H 192.168.2.1 -t UDP_STREAM -f m -- -m 4000
The mtu is set to 4096.
JM
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-01-20 7:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-18 11:33 Interpreting perf stat on netperf and netserver Jean-Michel Hautbois
2012-01-18 17:49 ` Rick Jones
2012-01-19 8:29 ` Jean-Michel Hautbois
2012-01-19 17:58 ` Rick Jones
2012-01-20 7:20 ` Jean-Michel Hautbois
2012-01-19 9:24 ` Eric Dumazet
2012-01-19 10:26 ` Jean-Michel Hautbois
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.