* Mainline kernel OLTP performance update
@ 2009-04-23 16:49 Styner, Douglas W
2009-04-27 7:02 ` Andi Kleen
0 siblings, 1 reply; 13+ messages in thread
From: Styner, Douglas W @ 2009-04-23 16:49 UTC (permalink / raw)
To: linux-kernel
Summary: Measured the mainline kernel from kernel.org (2.6.30-rc2).
The regression for 2.6.30-rc2 against the baseline, 2.6.24.2 is 1.95%. Note the dip in cycles for database compared to us% in summary.
Linux OLTP Performance summary
Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle% iowait%
2.6.24.2 1.000 22106 43709 75 24 0 0
2.6.30-rc2 0.981 30755 43072 75 25 0 0
Server configurations:
Intel Xeon Quad-core 2.0GHz 2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2 Cycles% 2.6.30-rc2
74.8578 <database> 67.6966 <database>
1.0500 qla24xx_start_scsi 1.1724 qla24xx_start_scsi
0.8089 schedule 1.0578 qla24xx_intr_handler
0.5864 kmem_cache_alloc 0.8259 __schedule
0.4989 __blockdev_direct_IO 0.7451 kmem_cache_alloc
0.4357 __sigsetjmp 0.4872 __blockdev_direct_IO
0.4152 copy_user_generic_string 0.4390 task_rq_lock
0.3953 qla24xx_intr_handler 0.4338 __sigsetjmp
0.3850 memcpy 0.4195 __switch_to
0.3596 scsi_request_fn 0.3713 copy_user_generic_string
0.3188 __switch_to 0.3608 __list_add
0.2889 lock_timer_base 0.3595 rb_get_reader_page
0.2750 memmove 0.3309 ring_buffer_consume
0.2519 task_rq_lock 0.3152 scsi_request_fn
0.2474 aio_complete 0.3048 try_to_wake_up
0.2460 scsi_alloc_sgtable 0.2983 tcp_sendmsg
0.2445 generic_make_request 0.2931 lock_timer_base
0.2263 qla2x00_process_completed_re0.2840 aio_complete
0.2118 blk_queue_end_tag 0.2697 memset_c
0.2085 dio_bio_complete 0.2527 mod_timer
0.2021 e1000_xmit_frame 0.2462 qla2x00_process_completed_re
0.2006 __end_that_request_first 0.2449 memmove
0.1954 generic_file_aio_read 0.2358 blk_queue_end_tag
0.1949 kfree 0.2241 generic_make_request
0.1915 tcp_sendmsg 0.2215 scsi_device_unbusy
0.1901 try_to_wake_up 0.2162 mempool_free
0.1895 kref_get 0.2097 e1000_xmit_frame
0.1864 __mod_timer 0.2097 kmem_cache_free
0.1863 thread_return 0.2058 kfree
0.1854 math_state_restore 0.1993 sched_clock_cpu
Thanks,
Doug
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Mainline kernel OLTP performance update
2009-04-23 16:49 Mainline kernel OLTP performance update Styner, Douglas W
@ 2009-04-27 7:02 ` Andi Kleen
2009-04-27 18:38 ` Discrepancies between Oprofile and vmstat Styner, Douglas W
2009-04-28 16:57 ` Mainline kernel OLTP performance update Chuck Ebbert
0 siblings, 2 replies; 13+ messages in thread
From: Andi Kleen @ 2009-04-27 7:02 UTC (permalink / raw)
To: Styner, Douglas W; +Cc: linux-kernel, linux-driver, linux-scsi
"Styner, Douglas W" <douglas.w.styner@intel.com> writes:
>
> ======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.24.2 Cycles% 2.6.30-rc2
> 74.8578 <database> 67.6966 <database>
The dip in database cycles is indeed worrying.
> 1.0500 qla24xx_start_scsi 1.1724 qla24xx_start_scsi
> 0.8089 schedule 1.0578 qla24xx_intr_handler
> 0.5864 kmem_cache_alloc 0.8259 __schedule
> 0.4989 __blockdev_direct_IO 0.7451 kmem_cache_alloc
> 0.4357 __sigsetjmp 0.4872 __blockdev_direct_IO
> 0.4152 copy_user_generic_string 0.4390 task_rq_lock
> 0.3953 qla24xx_intr_handler 0.4338 __sigsetjmp
And also why the qla24xx_intr_handler became ~2.5x as expensive.
Cc linux-scsi and qla24xx maintainers.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Discrepancies between Oprofile and vmstat
2009-04-27 7:02 ` Andi Kleen
@ 2009-04-27 18:38 ` Styner, Douglas W
2009-04-27 18:49 ` Andi Kleen
2009-04-27 18:53 ` Discrepancies between Oprofile and vmstat II Andi Kleen
2009-04-28 16:57 ` Mainline kernel OLTP performance update Chuck Ebbert
1 sibling, 2 replies; 13+ messages in thread
From: Styner, Douglas W @ 2009-04-27 18:38 UTC (permalink / raw)
To: linux-kernel, oprofile-list; +Cc: Ma, Chinang, Andi Kleen, willy
We are using an OLTP-type workload on a fully saturated system and have begun seeing a discrepancy between the output of oprofile and vmstat on upstream kernels that was first seen with the 2.6.27 kernel.
Workload:
OLTP database workload using QLogic FC cards with MSI-X enabled doing ~50,000 IOPS
Oprofile (v0.9.3) collection methodology:
opcontrol --init
opcontrol --vmlinux=/usr/src/linux-`uname -r`/vmlinux
opcontrol -e=CPU_CLK_UNHALTED:80000 -e=LLC_MISSES:6000
opcontrol --start-daemon
sleep 5
opcontrol --start
sleep 60
opcontrol --stop
opcontrol --save=$1
opcontrol --deinit
Function classification:
Modules vmlinux and *.ko = kernel, all others = user.
Results (2.6.30-rc3 kernel):
vmstat reports ~75% user, 25% kernel
oprofile reports 70.3808% user, 29.6142% kernel
Using another profiling tool to confirm, we see 74.784% user, 25.174% kernel.
Doug Styner
douglas.w.styner@intel.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Discrepancies between Oprofile and vmstat
2009-04-27 18:38 ` Discrepancies between Oprofile and vmstat Styner, Douglas W
@ 2009-04-27 18:49 ` Andi Kleen
2009-04-27 18:53 ` Discrepancies between Oprofile and vmstat II Andi Kleen
1 sibling, 0 replies; 13+ messages in thread
From: Andi Kleen @ 2009-04-27 18:49 UTC (permalink / raw)
To: Styner, Douglas W
Cc: linux-kernel, oprofile-list, Ma, Chinang, Andi Kleen, willy, rjw
On Mon, Apr 27, 2009 at 11:38:08AM -0700, Styner, Douglas W wrote:
> We are using an OLTP-type workload on a fully saturated system and have begun seeing a discrepancy between the output of oprofile and vmstat on upstream kernels that was first seen with the 2.6.27 kernel.
Rafael,
something to track for the regression list.
-Andi
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Discrepancies between Oprofile and vmstat II
2009-04-27 18:53 ` Discrepancies between Oprofile and vmstat II Andi Kleen
@ 2009-04-27 18:51 ` Styner, Douglas W
2009-04-27 22:15 ` Styner, Douglas W
1 sibling, 0 replies; 13+ messages in thread
From: Styner, Douglas W @ 2009-04-27 18:51 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, oprofile-list, Ma, Chinang, willy
I believe so, but will confirm.
> opcontrol -e=CPU_CLK_UNHALTED:80000 -e=LLC_MISSES:6000
>
> Using another profiling tool to confirm, we see 74.784% user, 25.174% kernel.
Just verifying -- you also see it when you use a shorter period than 80000 right?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Discrepancies between Oprofile and vmstat II
2009-04-27 18:38 ` Discrepancies between Oprofile and vmstat Styner, Douglas W
2009-04-27 18:49 ` Andi Kleen
@ 2009-04-27 18:53 ` Andi Kleen
2009-04-27 18:51 ` Styner, Douglas W
2009-04-27 22:15 ` Styner, Douglas W
1 sibling, 2 replies; 13+ messages in thread
From: Andi Kleen @ 2009-04-27 18:53 UTC (permalink / raw)
To: Styner, Douglas W
Cc: linux-kernel, oprofile-list, Ma, Chinang, Andi Kleen, willy
> opcontrol -e=CPU_CLK_UNHALTED:80000 -e=LLC_MISSES:6000
>
> Using another profiling tool to confirm, we see 74.784% user, 25.174% kernel.
Just verifying -- you also see it when you use a shorter period than 80000 right?
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Discrepancies between Oprofile and vmstat II
2009-04-27 18:53 ` Discrepancies between Oprofile and vmstat II Andi Kleen
2009-04-27 18:51 ` Styner, Douglas W
@ 2009-04-27 22:15 ` Styner, Douglas W
2009-04-27 22:22 ` Andi Kleen
1 sibling, 1 reply; 13+ messages in thread
From: Styner, Douglas W @ 2009-04-27 22:15 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, oprofile-list, Ma, Chinang, willy
> I believe so, but will confirm.
>
> > opcontrol -e=CPU_CLK_UNHALTED:80000 -e=LLC_MISSES:6000
> >
> > Using another profiling tool to confirm, we see 74.784% user, 25.174% > > kernel.
>
> Just verifying -- you also see it when you use a shorter period than 80000 > right?
Confirmed. Changes in profile appear to be due to increased sampling by oprofile with lower value.
Linux OLTP Performance summary
Kernel# Speedup(x) Intr/s CtxSw/s us% sys% idle% iowait%
2.6.30-rc3 1.000 30593 43976 74 25 0 1
2.6.30-rc3 1.001 30534 43210 75 25 0 0
Server configurations:
Intel Xeon Quad-core 2.0GHz 2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
======oprofile CPU_CLK_UNHALTED for top 30 functions
-e=CPU_CLK_UNHALTED:80000 -e=CPU_CLK_UNHALTED:20000
Cycles% 2.6.30-rc3 Cycles% 2.6.30-rc3
68.5544 <database> 65.6876 <database>
1.1859 qla24xx_start_scsi 1.1033 kstat_irqs_cpu
0.9436 qla24xx_intr_handler 1.0575 rb_get_reader_page
0.8307 __schedule 1.0034 qla24xx_start_scsi
0.7194 kmem_cache_alloc 0.9305 qla24xx_intr_handler
0.5026 __blockdev_direct_IO 0.8410 ring_buffer_consume
0.4319 __sigsetjmp 0.8160 __schedule
0.4244 scsi_request_fn 0.5683 kmem_cache_alloc
0.3853 rb_get_reader_page 0.4517 __sigsetjmp
0.3777 __switch_to 0.4413 unmap_vmas
0.3552 __list_add 0.3809 __blockdev_direct_IO
0.3552 task_rq_lock 0.3726 __switch_to
0.3371 try_to_wake_up 0.3310 __list_add
0.3221 ring_buffer_consume 0.3206 task_rq_lock
0.2844 aio_complete 0.3123 scsi_request_fn
0.2588 memmove 0.2977 aio_complete
0.2588 mod_timer 0.2914 try_to_wake_up
0.2558 generic_make_request 0.2644 page_fault
0.2543 tcp_sendmsg 0.2436 kmem_cache_free
0.2528 copy_user_generic_string 0.2415 scsi_device_unbusy
0.2468 lock_timer_base 0.2352 copy_user_generic_string
0.2468 memset_c 0.2311 memmove
0.2288 blk_queue_end_tag 0.2227 lock_timer_base
0.2257 qla2x00_process_completed_re0.2123 generic_make_request
0.2212 kref_get 0.2103 kfree
0.2182 mempool_free 0.2040 find_vma
0.2137 sd_prep_fn 0.2019 sd_prep_fn
0.2122 e1000_xmit_frame 0.2019 blk_queue_end_tag
0.2062 dequeue_rt_stack 0.2019 tcp_sendmsg
0.2047 scsi_device_unbusy 0.1998 qla2x00_process_completed_re
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Discrepancies between Oprofile and vmstat II
2009-04-27 22:15 ` Styner, Douglas W
@ 2009-04-27 22:22 ` Andi Kleen
2009-04-27 22:35 ` Styner, Douglas W
2009-04-28 5:26 ` Eric Dumazet
0 siblings, 2 replies; 13+ messages in thread
From: Andi Kleen @ 2009-04-27 22:22 UTC (permalink / raw)
To: Styner, Douglas W
Cc: Andi Kleen, linux-kernel, oprofile-list, Ma, Chinang, willy
On Mon, Apr 27, 2009 at 03:15:27PM -0700, Styner, Douglas W wrote:
>
> > I believe so, but will confirm.
> >
> > > opcontrol -e=CPU_CLK_UNHALTED:80000 -e=LLC_MISSES:6000
> > >
> > > Using another profiling tool to confirm, we see 74.784% user, 25.174% > > kernel.
> >
> > Just verifying -- you also see it when you use a shorter period than 80000 > right?
>
> Confirmed. Changes in profile appear to be due to increased sampling by oprofile with lower value.
You mean "with higher value", right ?
It's still strange that it suddenly changed in 2.6.27 though. There shouldn't have been
a difference there. So I still think it's a regression.
-Andi
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Discrepancies between Oprofile and vmstat II
2009-04-27 22:22 ` Andi Kleen
@ 2009-04-27 22:35 ` Styner, Douglas W
2009-04-28 5:26 ` Eric Dumazet
1 sibling, 0 replies; 13+ messages in thread
From: Styner, Douglas W @ 2009-04-27 22:35 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, oprofile-list, Ma, Chinang, willy
>
>You mean "with higher value", right ?
Right. Higher number of samples; lower count value. I'm going to try it the other way as well (count value = 200000)
>
>It's still strange that it suddenly changed in 2.6.27 though. There
>shouldn't have been
>a difference there. So I still think it's a regression.
>
We only first saw it in 2.6.27. It is possible that it goes back to 2.6.26 or 2.6.25. We have not confirmed those kernels.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Discrepancies between Oprofile and vmstat II
2009-04-27 22:22 ` Andi Kleen
2009-04-27 22:35 ` Styner, Douglas W
@ 2009-04-28 5:26 ` Eric Dumazet
1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2009-04-28 5:26 UTC (permalink / raw)
To: Andi Kleen
Cc: Styner, Douglas W, linux-kernel, oprofile-list, Ma, Chinang, willy
Andi Kleen a écrit :
> On Mon, Apr 27, 2009 at 03:15:27PM -0700, Styner, Douglas W wrote:
>>> I believe so, but will confirm.
>>>
>>>> opcontrol -e=CPU_CLK_UNHALTED:80000 -e=LLC_MISSES:6000
>>>>
>>>> Using another profiling tool to confirm, we see 74.784% user, 25.174% > > kernel.
>>> Just verifying -- you also see it when you use a shorter period than 80000 > right?
>> Confirmed. Changes in profile appear to be due to increased sampling by oprofile with lower value.
>
> You mean "with higher value", right ?
>
> It's still strange that it suddenly changed in 2.6.27 though. There shouldn't have been
> a difference there. So I still think it's a regression.
>
An oprofile regression ? Oh no !
;) Just kidding
Introduction of ring buffers was probably the start of oprofile using more cpu cycles,
and I remember seeing rb_get_reader_page() and ring_buffer_consume()
climbing in top 10 functions on tbench oprofile session.
CPU: Core 2, speed 3000.46 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10000
samples cum. samples % cum. % symbol name
731641 731641 10.0572 10.0572 copy_from_user
328634 1060275 4.5174 14.5746 copy_to_user
269686 1329961 3.7071 18.2817 ipt_do_table
254019 1583980 3.4918 21.7735 __schedule
205152 1789132 2.8200 24.5935 tcp_ack
202930 1992062 2.7895 27.3830 rb_get_reader_page rank 6
170751 2162813 2.3472 29.7301 tcp_sendmsg
152271 2315084 2.0931 31.8233 ring_buffer_consume rank 8
147362 2462446 2.0256 33.8489 tcp_recvmsg
138150 2600596 1.8990 35.7479 tcp_transmit_skb
134191 2734787 1.8446 37.5925 ip_queue_xmit
125952 2860739 1.7313 39.3239 sysenter_past_esp
112158 2972897 1.5417 40.8656 tcp_rcv_established
106236 3079133 1.4603 42.3259 __switch_to
105435 3184568 1.4493 43.7752 tcp_v4_rcv
97660 3282228 1.3424 45.1177 netif_receive_skb
89327 3371555 1.2279 46.3456 dst_release
78746 3450301 1.0824 47.4280 dev_queue_xmit
74776 3525077 1.0279 48.4559 release_sock
72979 3598056 1.0032 49.4591 lock_sock_nested
72031 3670087 0.9901 50.4492 tcp_event_data_recv
71130 3741217 0.9778 51.4270 tcp_write_xmit
69475 3810692 0.9550 52.3820 __inet_lookup_established
66537 3877229 0.9146 53.2966 nf_iterate
60504 3937733 0.8317 54.1283 ip_rcv
55767 3993500 0.7666 54.8949 sys_socketcall
53273 4046773 0.7323 55.6272 update_curr
49315 4096088 0.6779 56.3051 ip_finish_output
48958 4145046 0.6730 56.9780 __alloc_skb
48440 4193486 0.6659 57.6439 skb_release_data
46166 4239652 0.6346 58.2785 page_address
46147 4285799 0.6343 58.9128 local_bh_enable
43867 4329666 0.6030 59.5158 netif_rx
43671 4373337 0.6003 60.1161 put_page
43554 4416891 0.5987 60.7148 __update_sched_clock
42899 4459790 0.5897 61.3045 get_page_from_freelist
42747 4502537 0.5876 61.8921 sched_clock_cpu
Also other rb functions use some cycles...
c0261580 <rb_advance_reader>: /* rb_advance_reader total: 95595 0.4967 */
c0261cf0 <rb_buffer_peek>: /* rb_buffer_peek total: 75520 0.3924 */
c0260d80 <rb_event_length>: /* rb_event_length total: 70396 0.3658 */
c0260dd0 <ring_buffer_event_length>: /* ring_buffer_event_length total: 31028 0.1612 */
c02613b0 <ring_buffer_event_data>: /* ring_buffer_event_data total: 20663 0.1074 */
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 10000
CPU_CLK_UNHALT...|
samples| %|
------------------
12062103 80.0850 vmlinux
1319946 8.7636 libc-2.3.4.so
1089407 7.2330 tbench
CPU_CLK_UNHALT...|
samples| %|
------------------
1089406 99.9999 tbench
1 9.2e-05 [heap] (tgid:16048 range:0x804e000-0x8080000)
403877 2.6815 oprofile
107343 0.7127 tbench_srv
55018 0.3653 oprofiled
c02613e0 <rb_get_reader_page>: /* rb_get_reader_page total: 534995 2.7800 */
13316 0.0692 :c02613e0: push %ebp
5 2.6e-05 :c02613e1: mov %esp,%ebp
:c02613e3: push %edi
6864 0.0357 :c02613e4: push %esi
11 5.7e-05 :c02613e5: push %ebx
14 7.3e-05 :c02613e6: sub $0x1c,%esp
6738 0.0350 :c02613e9: pushf
27140 0.1410 :c02613ea: popl 0x10(%esp)
13627 0.0708 :c02613ee: cli
27016 0.1404 :c02613ef: lea 0xc(%eax),%edx
17 8.8e-05 :c02613f2: mov %edx,0x14(%esp)
16 8.3e-05 :c02613f6: mov $0x100,%edx
6765 0.0352 :c02613fb: mov 0x14(%esp),%ecx
20 1.0e-04 :c02613ff: lock xadd %dx,(%ecx)
142726 0.7417 :c0261404: cmp %dh,%dl
11 5.7e-05 :c0261406: je c026140e <rb_get_reader_page+0x2e>
:c0261408: pause
:c026140a: mov (%ecx),%dl
:c026140c: jmp c0261404 <rb_get_reader_page+0x24>
6724 0.0349 :c026140e: mov 0x24(%eax),%ebx
67811 0.3524 :c0261411: mov 0x4(%ebx),%edx
27227 0.1415 :c0261414: mov 0x10(%ebx),%ecx
6908 0.0359 :c0261417: mov 0x8(%ecx),%esi
21286 0.1106 :c026141a: cmp %esi,%edx
4 2.1e-05 :c026141c: jb c0261528 <rb_get_reader_page+0x148>
8 4.2e-05 :c0261422: mov 0x8(%ecx),%ecx
197 0.0010 :c0261425: cmp %ecx,%edx
13 6.8e-05 :c0261427: ja c02614fa <rb_get_reader_page+0x11a>
12 6.2e-05 :c026142d: cmp %ebx,0x20(%eax)
12 6.2e-05 :c0261430: je c0261540 <rb_get_reader_page+0x160>
13 6.8e-05 :c0261436: lea 0x10(%eax),%esi
95 4.9e-04 :c0261439: mov $0x1,%edx
:c026143e: mov %esi,0x18(%esp)
45 2.3e-04 :c0261442: jmp c0261494 <rb_get_reader_page+0xb4>
:c0261444: lea 0x0(%esi,%eiz,1),%esi
24 1.2e-04 :c0261448: sub $0x8,%ebx
43 2.2e-04 :c026144b: mov %ebx,0x18(%eax)
36 1.9e-04 :c026144e: mov %ecx,0x24(%eax)
80 4.2e-04 :c0261451: inc %edx
:c0261452: mov 0x10(%ecx),%ebx
:c0261455: cmp $0x4,%edx
...
6952 0.0361 :c0261528: mov 0x14(%esp),%edx
203 0.0011 :c026152c: incb 0xc(%eax)
6789 0.0353 :c026152f: pushl 0x10(%esp)
30 1.6e-04 :c0261533: popf
108826 0.5655 :c0261534: add $0x1c,%esp
322 0.0017 :c0261537: mov %ebx,%eax
:c0261539: pop %ebx
13921 0.0723 :c026153a: pop %esi
6689 0.0348 :c026153b: pop %edi
6884 0.0358 :c026153c: leave
6791 0.0353 :c026153d: ret
:c026153e: xchg %ax,%ax
2 1.0e-05 :c0261540: xor %ebx,%ebx
7 3.6e-05 :c0261542: jmp c0261528 <rb_get_reader_page+0x148>
:c0261544: lea 0x40(%eax),%edx
c0261de0 <ring_buffer_consume>: /* ring_buffer_consume total: 395772 2.0566 */
6854 0.0356 :c0261de0: push %ebp
26 1.4e-04 :c0261de1: mov %esp,%ebp
:c0261de3: push %edi
12755 0.0663 :c0261de4: mov %eax,%edi
:c0261de6: push %esi
182 9.5e-04 :c0261de7: add $0x10,%edi
:c0261dea: push %ebx
5464 0.0284 :c0261deb: xor %esi,%esi
:c0261ded: sub $0x18,%esp
42 2.2e-04 :c0261df0: mov %edx,%ebx
3494 0.0182 :c0261df2: mov %eax,-0x18(%ebp)
301 0.0016 :c0261df5: mov %ecx,-0x24(%ebp)
610 0.0032 :c0261df8: lea 0x0(,%edx,4),%eax
:c0261dff: mov %eax,-0x20(%ebp)
3672 0.0191 :c0261e02: jmp c0261e13 <ring_buffer_consume+0x33>
:c0261e04: lea 0x0(%esi,%eiz,1),%esi
:c0261e08: test %esi,%esi
:c0261e0a: je c0261e68 <ring_buffer_consume+0x88>
3367 0.0175 :c0261e0c: testb $0x3,(%esi)
37 1.9e-04 :c0261e0f: jne c0261e68 <ring_buffer_consume+0x88>
:c0261e11: pause
3410 0.0177 :c0261e13: bt %ebx,(%edi)
7766 0.0404 :c0261e16: sbb %eax,%eax
4221 0.0219 :c0261e18: test %eax,%eax
:c0261e1a: je c0261e08 <ring_buffer_consume+0x28>
940 0.0049 :c0261e1c: mov -0x18(%ebp),%edx
2967 0.0154 :c0261e1f: mov 0x28(%edx),%eax
12178 0.0633 :c0261e22: mov -0x20(%ebp),%edx
2547 0.0132 :c0261e25: mov (%eax,%edx,1),%eax
5736 0.0298 :c0261e28: mov %eax,-0x14(%ebp)
1065 0.0055 :c0261e2b: add $0x8,%eax
27718 0.1440 :c0261e2e: mov %eax,-0x10(%ebp)
4312 0.0224 :c0261e31: call c04d23b0 <_spin_lock_irqsave>
183800 0.9551 :c0261e36: mov -0x24(%ebp),%ecx
3613 0.0188 :c0261e39: mov %eax,-0x1c(%ebp)
6656 0.0346 :c0261e3c: mov %ebx,%edx
:c0261e3e: mov -0x18(%ebp),%eax
11 5.7e-05 :c0261e41: call c0261cf0 <rb_buffer_peek>
3506 0.0182 :c0261e46: mov %eax,%esi
2 1.0e-05 :c0261e48: test %eax,%eax
2 1.0e-05 :c0261e4a: je c0261e54 <ring_buffer_consume+0x74>
2 1.0e-05 :c0261e4c: mov -0x14(%ebp),%eax
10 5.2e-05 :c0261e4f: call c0261580 <rb_advance_reader>
3389 0.0176 :c0261e54: mov -0x1c(%ebp),%edx
13 6.8e-05 :c0261e57: mov -0x10(%ebp),%eax
17 8.8e-05 :c0261e5a: call c04d2500 <_spin_unlock_irqrestore>
71541 0.3718 :c0261e5f: test %esi,%esi
4 2.1e-05 :c0261e61: jne c0261e0c <ring_buffer_consume+0x2c>
:c0261e63: nop
:c0261e64: lea 0x0(%esi,%eiz,1),%esi
3419 0.0178 :c0261e68: add $0x18,%esp
11 5.7e-05 :c0261e6b: mov %esi,%eax
1 5.2e-06 :c0261e6d: pop %ebx
3403 0.0177 :c0261e6e: pop %esi
17 8.8e-05 :c0261e6f: pop %edi
3343 0.0174 :c0261e70: leave
3348 0.0174 :c0261e71: ret
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Mainline kernel OLTP performance update
2009-04-27 7:02 ` Andi Kleen
2009-04-27 18:38 ` Discrepancies between Oprofile and vmstat Styner, Douglas W
@ 2009-04-28 16:57 ` Chuck Ebbert
2009-04-28 17:15 ` James Bottomley
1 sibling, 1 reply; 13+ messages in thread
From: Chuck Ebbert @ 2009-04-28 16:57 UTC (permalink / raw)
To: Andi Kleen; +Cc: Styner, Douglas W, linux-kernel, linux-driver, linux-scsi
On Mon, 27 Apr 2009 09:02:40 +0200
Andi Kleen <andi@firstfloor.org> wrote:
> "Styner, Douglas W" <douglas.w.styner@intel.com> writes:
>
> >
> > ======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
> > Cycles% 2.6.24.2 Cycles% 2.6.30-rc2
> > 74.8578 <database> 67.6966 <database>
>
> The dip in database cycles is indeed worrying.
>
> > 1.0500 qla24xx_start_scsi 1.1724 qla24xx_start_scsi
> > 0.8089 schedule 1.0578 qla24xx_intr_handler
> > 0.5864 kmem_cache_alloc 0.8259 __schedule
> > 0.4989 __blockdev_direct_IO 0.7451 kmem_cache_alloc
> > 0.4357 __sigsetjmp 0.4872 __blockdev_direct_IO
> > 0.4152 copy_user_generic_string 0.4390 task_rq_lock
> > 0.3953 qla24xx_intr_handler 0.4338 __sigsetjmp
>
> And also why the qla24xx_intr_handler became ~2.5x as expensive.
> Cc linux-scsi and qla24xx maintainers.
>
They are getting 31000 interrupts/sec vs. 22000/sec on older kernels.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Mainline kernel OLTP performance update
2009-04-28 16:57 ` Mainline kernel OLTP performance update Chuck Ebbert
@ 2009-04-28 17:15 ` James Bottomley
2009-04-28 17:17 ` Styner, Douglas W
0 siblings, 1 reply; 13+ messages in thread
From: James Bottomley @ 2009-04-28 17:15 UTC (permalink / raw)
To: Chuck Ebbert
Cc: Andi Kleen, Styner, Douglas W, linux-kernel, linux-driver, linux-scsi
On Tue, 2009-04-28 at 12:57 -0400, Chuck Ebbert wrote:
> On Mon, 27 Apr 2009 09:02:40 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
>
> > "Styner, Douglas W" <douglas.w.styner@intel.com> writes:
> >
> > >
> > > ======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
> > > Cycles% 2.6.24.2 Cycles% 2.6.30-rc2
> > > 74.8578 <database> 67.6966 <database>
> >
> > The dip in database cycles is indeed worrying.
> >
> > > 1.0500 qla24xx_start_scsi 1.1724 qla24xx_start_scsi
> > > 0.8089 schedule 1.0578 qla24xx_intr_handler
> > > 0.5864 kmem_cache_alloc 0.8259 __schedule
> > > 0.4989 __blockdev_direct_IO 0.7451 kmem_cache_alloc
> > > 0.4357 __sigsetjmp 0.4872 __blockdev_direct_IO
> > > 0.4152 copy_user_generic_string 0.4390 task_rq_lock
> > > 0.3953 qla24xx_intr_handler 0.4338 __sigsetjmp
> >
> > And also why the qla24xx_intr_handler became ~2.5x as expensive.
> > Cc linux-scsi and qla24xx maintainers.
> >
>
> They are getting 31000 interrupts/sec vs. 22000/sec on older kernels.
Should be fixed by:
http://marc.info/?l=linux-scsi&m=124093712114937
If someone could verify, I'd be grateful.
Thanks,
James
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: Mainline kernel OLTP performance update
2009-04-28 17:15 ` James Bottomley
@ 2009-04-28 17:17 ` Styner, Douglas W
0 siblings, 0 replies; 13+ messages in thread
From: Styner, Douglas W @ 2009-04-28 17:17 UTC (permalink / raw)
To: James Bottomley, Chuck Ebbert
Cc: Andi Kleen, linux-kernel, linux-driver, linux-scsi, Ma, Chinang
Working on it as we speak...
>-----Original Message-----
>From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
>Sent: Tuesday, April 28, 2009 10:16 AM
>To: Chuck Ebbert
>Cc: Andi Kleen; Styner, Douglas W; linux-kernel@vger.kernel.org; linux-
>driver@qlogic.com; linux-scsi@vger.kernel.org
>Subject: Re: Mainline kernel OLTP performance update
>
>On Tue, 2009-04-28 at 12:57 -0400, Chuck Ebbert wrote:
>> On Mon, 27 Apr 2009 09:02:40 +0200
>> Andi Kleen <andi@firstfloor.org> wrote:
>>
>> > "Styner, Douglas W" <douglas.w.styner@intel.com> writes:
>> >
>> > >
>> > > ======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
>> > > Cycles% 2.6.24.2 Cycles% 2.6.30-rc2
>> > > 74.8578 <database> 67.6966 <database>
>> >
>> > The dip in database cycles is indeed worrying.
>> >
>> > > 1.0500 qla24xx_start_scsi 1.1724 qla24xx_start_scsi
>> > > 0.8089 schedule 1.0578 qla24xx_intr_handler
>> > > 0.5864 kmem_cache_alloc 0.8259 __schedule
>> > > 0.4989 __blockdev_direct_IO 0.7451 kmem_cache_alloc
>> > > 0.4357 __sigsetjmp 0.4872 __blockdev_direct_IO
>> > > 0.4152 copy_user_generic_string 0.4390 task_rq_lock
>> > > 0.3953 qla24xx_intr_handler 0.4338 __sigsetjmp
>> >
>> > And also why the qla24xx_intr_handler became ~2.5x as expensive.
>> > Cc linux-scsi and qla24xx maintainers.
>> >
>>
>> They are getting 31000 interrupts/sec vs. 22000/sec on older kernels.
>
>Should be fixed by:
>
>http://marc.info/?l=linux-scsi&m=124093712114937
>
>If someone could verify, I'd be grateful.
>
>Thanks,
>
>James
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-04-28 17:17 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-23 16:49 Mainline kernel OLTP performance update Styner, Douglas W
2009-04-27 7:02 ` Andi Kleen
2009-04-27 18:38 ` Discrepancies between Oprofile and vmstat Styner, Douglas W
2009-04-27 18:49 ` Andi Kleen
2009-04-27 18:53 ` Discrepancies between Oprofile and vmstat II Andi Kleen
2009-04-27 18:51 ` Styner, Douglas W
2009-04-27 22:15 ` Styner, Douglas W
2009-04-27 22:22 ` Andi Kleen
2009-04-27 22:35 ` Styner, Douglas W
2009-04-28 5:26 ` Eric Dumazet
2009-04-28 16:57 ` Mainline kernel OLTP performance update Chuck Ebbert
2009-04-28 17:15 ` James Bottomley
2009-04-28 17:17 ` Styner, Douglas W
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.