linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Mainline kernel OLTP performance update
@ 2009-01-13 21:10 Ma, Chinang
  2009-01-13 22:44 ` Wilcox, Matthew R
  0 siblings, 1 reply; 122+ messages in thread
From: Ma, Chinang @ 2009-01-13 21:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tripathi, Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi,
	Siddha, Suresh B, Chilukuri, Harita, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, Chris Mason

This is latest 2.6.29-rc1 kernel OLTP performance result. Compare to 2.6.24.2 the regression is around 3.5%.

Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%  iowait%
2.6.24.2                1.000   21969   43425   76   24     0      0
2.6.27.2                0.973   30402   43523   74   25     0      1
2.6.29-rc1              0.965   30331   41970   74   26     0      0

Server configurations:
Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)

======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2                   Cycles% 2.6.27.2
1.0500 qla24xx_start_scsi          1.2125 qla24xx_start_scsi
0.8089 schedule                    0.6962 kmem_cache_alloc
0.5864 kmem_cache_alloc            0.6209 qla24xx_intr_handler
0.4989 __blockdev_direct_IO        0.4895 copy_user_generic_string
0.4152 copy_user_generic_string    0.4591 __blockdev_direct_IO
0.3953 qla24xx_intr_handler        0.4409 __end_that_request_first
0.3596 scsi_request_fn             0.3729 __switch_to
0.3188 __switch_to                 0.3716 try_to_wake_up
0.2889 lock_timer_base             0.3531 lock_timer_base
0.2519 task_rq_lock                0.3393 scsi_request_fn
0.2474 aio_complete                0.3038 aio_complete
0.2460 scsi_alloc_sgtable          0.2989 memset_c
0.2445 generic_make_request        0.2633 qla2x00_process_completed_re
0.2263 qla2x00_process_completed_re0.2583 pick_next_highest_task_rt
0.2118 blk_queue_end_tag           0.2578 generic_make_request
0.2085 dio_bio_complete            0.2510 __list_add
0.2021 e1000_xmit_frame            0.2459 task_rq_lock
0.2006 __end_that_request_first    0.2322 kmem_cache_free
0.1954 generic_file_aio_read       0.2206 blk_queue_end_tag
0.1949 kfree                       0.2205 __mod_timer
0.1915 tcp_sendmsg                 0.2179 update_curr_rt
0.1901 try_to_wake_up              0.2164 sd_prep_fn
0.1895 kref_get                    0.2130 kref_get
0.1864 __mod_timer                 0.2075 dio_bio_complete
0.1863 thread_return               0.2066 push_rt_task
0.1854 math_state_restore          0.1974 qla24xx_msix_default
0.1775 __list_add                  0.1935 generic_file_aio_read
0.1721 memset_c                    0.1870 scsi_device_unbusy
0.1706 find_vma                    0.1861 tcp_sendmsg
0.1688 read_tsc                    0.1843 e1000_xmit_frame

======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2                   Cycles% 2.6.29-rc1
1.0500 qla24xx_start_scsi          1.0691 qla24xx_intr_handler
0.8089 schedule                    0.7701 copy_user_generic_string
0.5864 kmem_cache_alloc            0.7339 qla24xx_wrt_req_reg
0.4989 __blockdev_direct_IO        0.6458 kmem_cache_alloc
0.4152 copy_user_generic_string    0.5794 qla24xx_start_scsi
0.3953 qla24xx_intr_handler        0.5505 unmap_vmas
0.3596 scsi_request_fn             0.4869 __blockdev_direct_IO
0.3188 __switch_to                 0.4493 try_to_wake_up
0.2889 lock_timer_base             0.4291 scsi_request_fn
0.2519 task_rq_lock                0.4118 clear_page_c
0.2474 aio_complete                0.4002 __switch_to
0.2460 scsi_alloc_sgtable          0.3381 ring_buffer_consume
0.2445 generic_make_request        0.3366 rb_get_reader_page
0.2263 qla2x00_process_completed_re0.3222 aio_complete
0.2118 blk_queue_end_tag           0.3135 memset_c
0.2085 dio_bio_complete            0.2875 __list_add
0.2021 e1000_xmit_frame            0.2673 task_rq_lock
0.2006 __end_that_request_first    0.2658 __end_that_request_first
0.1954 generic_file_aio_read       0.2615 qla2x00_process_completed_re
0.1949 kfree                       0.2615 lock_timer_base
0.1915 tcp_sendmsg                 0.2456 disk_map_sector_rcu
0.1901 try_to_wake_up              0.2427 tcp_sendmsg
0.1895 kref_get                    0.2413 e1000_xmit_frame
0.1864 __mod_timer                 0.2398 kmem_cache_free
0.1863 thread_return               0.2384 pick_next_highest_task_rt
0.1854 math_state_restore          0.2225 blk_queue_end_tag
0.1775 __list_add                  0.2211 sd_prep_fn
0.1721 memset_c                    0.2167 qla24xx_queuecommand
0.1706 find_vma                    0.2109 scsi_device_unbusy
0.1688 read_tsc                    0.2095 kref_get


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-13 21:10 Mainline kernel OLTP performance update Ma, Chinang
@ 2009-01-13 22:44 ` Wilcox, Matthew R
  2009-01-15  0:35   ` Andrew Morton
  0 siblings, 1 reply; 122+ messages in thread
From: Wilcox, Matthew R @ 2009-01-13 22:44 UTC (permalink / raw)
  To: Ma, Chinang, linux-kernel
  Cc: Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B,
	Chilukuri, Harita, Styner, Douglas W, Wang, Peter Xihong,
	Nueckel, Hubert, Chris Mason, Steven Rostedt

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7470 bytes --]


One encouraging thing is that we don't see a significant drop-off between 2.6.28 and 2.6.29-rc1, which I think is the first time we've not seen a big problem with -rc1.

To compare the top 30 functions between 2.6.28 and 2.6.29-rc1:

1.4257 qla24xx_start_scsi		1.0691 qla24xx_intr_handler
0.8784 kmem_cache_alloc			0.7701 copy_user_generic_string
0.6876 qla24xx_intr_handler		0.7339 qla24xx_wrt_req_reg
0.5834 copy_user_generic_string	0.6458 kmem_cache_alloc
0.4945 scsi_request_fn			0.5794 qla24xx_start_scsi
0.4846 __blockdev_direct_IO		0.5505 unmap_vmas
0.4187 try_to_wake_up			0.4869 __blockdev_direct_IO
0.3518 aio_complete			0.4493 try_to_wake_up
0.3513 __end_that_request_first	0.4291 scsi_request_fn
0.3483 __switch_to			0.4118 clear_page_c
0.3271 memset_c				0.4002 __switch_to
0.2976 qla2x00_process_completed_re	0.3381 ring_buffer_consume
0.2905 __list_add				0.3366 rb_get_reader_page
0.2901 generic_make_request		0.3222 aio_complete
0.2755 lock_timer_base			0.3135 memset_c
0.2741 blk_queue_end_tag		0.2875 __list_add
0.2593 kmem_cache_free			0.2673 task_rq_lock
0.2445 disk_map_sector_rcu		0.2658 __end_that_request_first
0.2370 pick_next_highest_task_rt	0.2615 qla2x00_process_completed_re
0.2323 scsi_device_unbusy		0.2615 lock_timer_base
0.2321 task_rq_lock			0.2456 disk_map_sector_rcu
0.2316 scsi_dispatch_cmd		0.2427 tcp_sendmsg
0.2239 kref_get				0.2413 e1000_xmit_frame
0.2237 dio_bio_complete			0.2398 kmem_cache_free
0.2194 push_rt_task			0.2384 pick_next_highest_task_rt
0.2145 __aio_get_req			0.2225 blk_queue_end_tag
0.2143 kfree				0.2211 sd_prep_fn
0.2138 __mod_timer			0.2167 qla24xx_queuecommand
0.2131 e1000_irq_enable			0.2109 scsi_device_unbusy
0.2091 scsi_softirq_done		0.2095 kref_get

It looks like a number of functions in the qla2x00 driver were split up, so it's probably best to ignore all the changes in qla* functions.

unmap_vmas is a new hot function.  It's been around since before git history started, and hasn't changed substantially between 2.6.28 and 2.6.29-rc1, so I suspect we're calling it more often.  I don't know why we'd be doing that.

clear_page_c is also new to the hot list.  I haven't tried to understand why this might be so.

The ring_buffer_consume() and rb_get_reader_page() functions are part of the oprofile code.  This seems to indicate a bug -- they should not be the #12 and #13 hottest functions in the kernel when monitoring a database run!

That seems to be about it for regressions.

> -----Original Message-----
> From: Ma, Chinang
> Sent: Tuesday, January 13, 2009 1:11 PM
> To: linux-kernel@vger.kernel.org
> Cc: Tripathi, Sharad C; arjan@linux.intel.com; Wilcox, Matthew R; Kleen,
> Andi; Siddha, Suresh B; Chilukuri, Harita; Styner, Douglas W; Wang, Peter
> Xihong; Nueckel, Hubert; Chris Mason
> Subject: Mainline kernel OLTP performance update
> 
> This is latest 2.6.29-rc1 kernel OLTP performance result. Compare to
> 2.6.24.2 the regression is around 3.5%.
> 
> Linux OLTP Performance summary
> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%  iowait%
> 2.6.24.2                1.000   21969   43425   76   24     0      0
> 2.6.27.2                0.973   30402   43523   74   25     0      1
> 2.6.29-rc1              0.965   30331   41970   74   26     0      0
> 
> Server configurations:
> Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
> 
> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.24.2                   Cycles% 2.6.27.2
> 1.0500 qla24xx_start_scsi          1.2125 qla24xx_start_scsi
> 0.8089 schedule                    0.6962 kmem_cache_alloc
> 0.5864 kmem_cache_alloc            0.6209 qla24xx_intr_handler
> 0.4989 __blockdev_direct_IO        0.4895 copy_user_generic_string
> 0.4152 copy_user_generic_string    0.4591 __blockdev_direct_IO
> 0.3953 qla24xx_intr_handler        0.4409 __end_that_request_first
> 0.3596 scsi_request_fn             0.3729 __switch_to
> 0.3188 __switch_to                 0.3716 try_to_wake_up
> 0.2889 lock_timer_base             0.3531 lock_timer_base
> 0.2519 task_rq_lock                0.3393 scsi_request_fn
> 0.2474 aio_complete                0.3038 aio_complete
> 0.2460 scsi_alloc_sgtable          0.2989 memset_c
> 0.2445 generic_make_request        0.2633 qla2x00_process_completed_re
> 0.2263 qla2x00_process_completed_re0.2583 pick_next_highest_task_rt
> 0.2118 blk_queue_end_tag           0.2578 generic_make_request
> 0.2085 dio_bio_complete            0.2510 __list_add
> 0.2021 e1000_xmit_frame            0.2459 task_rq_lock
> 0.2006 __end_that_request_first    0.2322 kmem_cache_free
> 0.1954 generic_file_aio_read       0.2206 blk_queue_end_tag
> 0.1949 kfree                       0.2205 __mod_timer
> 0.1915 tcp_sendmsg                 0.2179 update_curr_rt
> 0.1901 try_to_wake_up              0.2164 sd_prep_fn
> 0.1895 kref_get                    0.2130 kref_get
> 0.1864 __mod_timer                 0.2075 dio_bio_complete
> 0.1863 thread_return               0.2066 push_rt_task
> 0.1854 math_state_restore          0.1974 qla24xx_msix_default
> 0.1775 __list_add                  0.1935 generic_file_aio_read
> 0.1721 memset_c                    0.1870 scsi_device_unbusy
> 0.1706 find_vma                    0.1861 tcp_sendmsg
> 0.1688 read_tsc                    0.1843 e1000_xmit_frame
> 
> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.24.2                   Cycles% 2.6.29-rc1
> 1.0500 qla24xx_start_scsi          1.0691 qla24xx_intr_handler
> 0.8089 schedule                    0.7701 copy_user_generic_string
> 0.5864 kmem_cache_alloc            0.7339 qla24xx_wrt_req_reg
> 0.4989 __blockdev_direct_IO        0.6458 kmem_cache_alloc
> 0.4152 copy_user_generic_string    0.5794 qla24xx_start_scsi
> 0.3953 qla24xx_intr_handler        0.5505 unmap_vmas
> 0.3596 scsi_request_fn             0.4869 __blockdev_direct_IO
> 0.3188 __switch_to                 0.4493 try_to_wake_up
> 0.2889 lock_timer_base             0.4291 scsi_request_fn
> 0.2519 task_rq_lock                0.4118 clear_page_c
> 0.2474 aio_complete                0.4002 __switch_to
> 0.2460 scsi_alloc_sgtable          0.3381 ring_buffer_consume
> 0.2445 generic_make_request        0.3366 rb_get_reader_page
> 0.2263 qla2x00_process_completed_re0.3222 aio_complete
> 0.2118 blk_queue_end_tag           0.3135 memset_c
> 0.2085 dio_bio_complete            0.2875 __list_add
> 0.2021 e1000_xmit_frame            0.2673 task_rq_lock
> 0.2006 __end_that_request_first    0.2658 __end_that_request_first
> 0.1954 generic_file_aio_read       0.2615 qla2x00_process_completed_re
> 0.1949 kfree                       0.2615 lock_timer_base
> 0.1915 tcp_sendmsg                 0.2456 disk_map_sector_rcu
> 0.1901 try_to_wake_up              0.2427 tcp_sendmsg
> 0.1895 kref_get                    0.2413 e1000_xmit_frame
> 0.1864 __mod_timer                 0.2398 kmem_cache_free
> 0.1863 thread_return               0.2384 pick_next_highest_task_rt
> 0.1854 math_state_restore          0.2225 blk_queue_end_tag
> 0.1775 __list_add                  0.2211 sd_prep_fn
> 0.1721 memset_c                    0.2167 qla24xx_queuecommand
> 0.1706 find_vma                    0.2109 scsi_device_unbusy
> 0.1688 read_tsc                    0.2095 kref_get

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-13 22:44 ` Wilcox, Matthew R
@ 2009-01-15  0:35   ` Andrew Morton
  2009-01-15  1:21     ` Matthew Wilcox
  0 siblings, 1 reply; 122+ messages in thread
From: Andrew Morton @ 2009-01-15  0:35 UTC (permalink / raw)
  To: Wilcox, Matthew R
  Cc: chinang.ma, linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty

On Tue, 13 Jan 2009 15:44:17 -0700
"Wilcox, Matthew R" <matthew.r.wilcox@intel.com> wrote:
>

(top-posting repaired.  That @intel.com address is a bad influence ;))

(cc linux-scsi)

> > -----Original Message-----
> > From: Ma, Chinang
> > Sent: Tuesday, January 13, 2009 1:11 PM
> > To: linux-kernel@vger.kernel.org
> > Cc: Tripathi, Sharad C; arjan@linux.intel.com; Wilcox, Matthew R; Kleen,
> > Andi; Siddha, Suresh B; Chilukuri, Harita; Styner, Douglas W; Wang, Peter
> > Xihong; Nueckel, Hubert; Chris Mason
> > Subject: Mainline kernel OLTP performance update
> > 
> > This is latest 2.6.29-rc1 kernel OLTP performance result. Compare to
> > 2.6.24.2 the regression is around 3.5%.
> > 
> > Linux OLTP Performance summary
> > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%  iowait%
> > 2.6.24.2                1.000   21969   43425   76   24     0      0
> > 2.6.27.2                0.973   30402   43523   74   25     0      1
> > 2.6.29-rc1              0.965   30331   41970   74   26     0      0
> > 
> > Server configurations:
> > Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
> > 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
>
> 
> One encouraging thing is that we don't see a significant drop-off between 2.6.28 and 2.6.29-rc1, which I think is the first time we've not seen a big problem with -rc1.
> 
> To compare the top 30 functions between 2.6.28 and 2.6.29-rc1:
> 
> 1.4257 qla24xx_start_scsi		1.0691 qla24xx_intr_handler
> 0.8784 kmem_cache_alloc			0.7701 copy_user_generic_string
> 0.6876 qla24xx_intr_handler		0.7339 qla24xx_wrt_req_reg
> 0.5834 copy_user_generic_string	0.6458 kmem_cache_alloc
> 0.4945 scsi_request_fn			0.5794 qla24xx_start_scsi
> 0.4846 __blockdev_direct_IO		0.5505 unmap_vmas
> 0.4187 try_to_wake_up			0.4869 __blockdev_direct_IO
> 0.3518 aio_complete			0.4493 try_to_wake_up
> 0.3513 __end_that_request_first	0.4291 scsi_request_fn
> 0.3483 __switch_to			0.4118 clear_page_c
> 0.3271 memset_c				0.4002 __switch_to
> 0.2976 qla2x00_process_completed_re	0.3381 ring_buffer_consume
> 0.2905 __list_add				0.3366 rb_get_reader_page
> 0.2901 generic_make_request		0.3222 aio_complete
> 0.2755 lock_timer_base			0.3135 memset_c
> 0.2741 blk_queue_end_tag		0.2875 __list_add
> 0.2593 kmem_cache_free			0.2673 task_rq_lock
> 0.2445 disk_map_sector_rcu		0.2658 __end_that_request_first
> 0.2370 pick_next_highest_task_rt	0.2615 qla2x00_process_completed_re
> 0.2323 scsi_device_unbusy		0.2615 lock_timer_base
> 0.2321 task_rq_lock			0.2456 disk_map_sector_rcu
> 0.2316 scsi_dispatch_cmd		0.2427 tcp_sendmsg
> 0.2239 kref_get				0.2413 e1000_xmit_frame
> 0.2237 dio_bio_complete			0.2398 kmem_cache_free
> 0.2194 push_rt_task			0.2384 pick_next_highest_task_rt
> 0.2145 __aio_get_req			0.2225 blk_queue_end_tag
> 0.2143 kfree				0.2211 sd_prep_fn
> 0.2138 __mod_timer			0.2167 qla24xx_queuecommand
> 0.2131 e1000_irq_enable			0.2109 scsi_device_unbusy
> 0.2091 scsi_softirq_done		0.2095 kref_get
> 
> It looks like a number of functions in the qla2x00 driver were split up, so it's probably best to ignore all the changes in qla* functions.
> 
> unmap_vmas is a new hot function.  It's been around since before git history started, and hasn't changed substantially between 2.6.28 and 2.6.29-rc1, so I suspect we're calling it more often.  I don't know why we'd be doing that.
> 
> clear_page_c is also new to the hot list.  I haven't tried to understand why this might be so.
> 
> The ring_buffer_consume() and rb_get_reader_page() functions are part of the oprofile code.  This seems to indicate a bug -- they should not be the #12 and #13 hottest functions in the kernel when monitoring a database run!
> 
> That seems to be about it for regressions.
> 

But the interrupt rate went through the roof.

A 3.5% slowdown in this workload is considered pretty serious, isn't it?

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  0:35   ` Andrew Morton
@ 2009-01-15  1:21     ` Matthew Wilcox
  2009-01-15  2:04       ` Andrew Morton
  2009-01-15 16:48       ` Ma, Chinang
  0 siblings, 2 replies; 122+ messages in thread
From: Matthew Wilcox @ 2009-01-15  1:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wilcox, Matthew R, chinang.ma, linux-kernel, sharad.c.tripathi,
	arjan, andi.kleen, suresh.b.siddha, harita.chilukuri,
	douglas.w.styner, peter.xihong.wang, hubert.nueckel, chris.mason,
	srostedt, linux-scsi, Andrew Vasquez, Anirban Chakraborty

On Wed, Jan 14, 2009 at 04:35:57PM -0800, Andrew Morton wrote:
> On Tue, 13 Jan 2009 15:44:17 -0700
> "Wilcox, Matthew R" <matthew.r.wilcox@intel.com> wrote:
> >
> 
> (top-posting repaired.  That @intel.com address is a bad influence ;))

Alas, that email address goes to an Outlook client.  Not much to be done
about that.

> (cc linux-scsi)
> 
> > > This is latest 2.6.29-rc1 kernel OLTP performance result. Compare to
> > > 2.6.24.2 the regression is around 3.5%.
> > > 
> > > Linux OLTP Performance summary
> > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%  iowait%
> > > 2.6.24.2                1.000   21969   43425   76   24     0      0
> > > 2.6.27.2                0.973   30402   43523   74   25     0      1
> > > 2.6.29-rc1              0.965   30331   41970   74   26     0      0

> But the interrupt rate went through the roof.

Yes.  I forget why that was; I'll have to dig through my archives for
that.

> A 3.5% slowdown in this workload is considered pretty serious, isn't it?

Yes.  Anything above 0.3% is statistically significant.  1% is a big
deal.  The fact that we've lost 3.5% in the last year doesn't make
people happy.  There's a few things we've identified that have a big
effect:

 - Per-partition statistics.  Putting in a sysctl to stop doing them gets
   some of that back, but not as much as taking them out (even when
   the sysctl'd variable is in a __read_mostly section).  We tried a
   patch from Jens to speed up the search for a new partition, but it
   had no effect.

 - The RT scheduler changes.  They're better for some RT tasks, but not
   the database benchmark workload.  Chinang has posted about
   this before, but the thread didn't really go anywhere.
   http://marc.info/?t=122903815000001&r=1&w=2

SLUB would have had a huge negative effect if we were using it -- on the
order of 7% iirc.  SLQB is at least performance-neutral with SLAB.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  1:21     ` Matthew Wilcox
@ 2009-01-15  2:04       ` Andrew Morton
  2009-01-15  2:27         ` Steven Rostedt
                           ` (3 more replies)
  2009-01-15 16:48       ` Ma, Chinang
  1 sibling, 4 replies; 122+ messages in thread
From: Andrew Morton @ 2009-01-15  2:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Wilcox, Matthew R, chinang.ma, linux-kernel, sharad.c.tripathi,
	arjan, andi.kleen, suresh.b.siddha, harita.chilukuri,
	douglas.w.styner, peter.xihong.wang, hubert.nueckel, chris.mason,
	srostedt, linux-scsi, Andrew Vasquez, Anirban Chakraborty

On Wed, 14 Jan 2009 18:21:47 -0700 Matthew Wilcox <matthew@wil.cx> wrote:

> On Wed, Jan 14, 2009 at 04:35:57PM -0800, Andrew Morton wrote:
> > On Tue, 13 Jan 2009 15:44:17 -0700
> > "Wilcox, Matthew R" <matthew.r.wilcox@intel.com> wrote:
> > >
> > 
> > (top-posting repaired.  That @intel.com address is a bad influence ;))
> 
> Alas, that email address goes to an Outlook client.  Not much to be done
> about that.

aspirin?

> > (cc linux-scsi)
> > 
> > > > This is latest 2.6.29-rc1 kernel OLTP performance result. Compare to
> > > > 2.6.24.2 the regression is around 3.5%.
> > > > 
> > > > Linux OLTP Performance summary
> > > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%  iowait%
> > > > 2.6.24.2                1.000   21969   43425   76   24     0      0
> > > > 2.6.27.2                0.973   30402   43523   74   25     0      1
> > > > 2.6.29-rc1              0.965   30331   41970   74   26     0      0
> 
> > But the interrupt rate went through the roof.
> 
> Yes.  I forget why that was; I'll have to dig through my archives for
> that.

Oh.  I'd have thought that this alone could account for 3.5%.

> > A 3.5% slowdown in this workload is considered pretty serious, isn't it?
> 
> Yes.  Anything above 0.3% is statistically significant.  1% is a big
> deal.  The fact that we've lost 3.5% in the last year doesn't make
> people happy.  There's a few things we've identified that have a big
> effect:
> 
>  - Per-partition statistics.  Putting in a sysctl to stop doing them gets
>    some of that back, but not as much as taking them out (even when
>    the sysctl'd variable is in a __read_mostly section).  We tried a
>    patch from Jens to speed up the search for a new partition, but it
>    had no effect.

I find this surprising.

>  - The RT scheduler changes.  They're better for some RT tasks, but not
>    the database benchmark workload.  Chinang has posted about
>    this before, but the thread didn't really go anywhere.
>    http://marc.info/?t=122903815000001&r=1&w=2

Well.  It's more a case that it wasn't taken anywhere.  I appear to
have recently been informed that there have never been any
CPU-scheduler-caused regressions.  Please persist!

> SLUB would have had a huge negative effect if we were using it -- on the
> order of 7% iirc.  SLQB is at least performance-neutral with SLAB.

We really need to unblock that problem somehow.  I assume that
enterprise distros are shipping slab?


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  2:04       ` Andrew Morton
@ 2009-01-15  2:27         ` Steven Rostedt
  2009-01-15  7:11           ` Ma, Chinang
  2009-01-15  2:39         ` Andi Kleen
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 122+ messages in thread
From: Steven Rostedt @ 2009-01-15  2:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, Wilcox, Matthew R, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Gregory Haskins

(added Ingo, Thomas, Peter and Gregory)

On Wed, 2009-01-14 at 18:04 -0800, Andrew Morton wrote:
> On Wed, 14 Jan 2009 18:21:47 -0700 Matthew Wilcox <matthew@wil.cx> wrote:
> 
> > On Wed, Jan 14, 2009 at 04:35:57PM -0800, Andrew Morton wrote:
> > > On Tue, 13 Jan 2009 15:44:17 -0700
> > > "Wilcox, Matthew R" <matthew.r.wilcox@intel.com> wrote:
> > > >
> > > 
> > > (top-posting repaired.  That @intel.com address is a bad influence ;))
> > 
> > Alas, that email address goes to an Outlook client.  Not much to be done
> > about that.
> 
> aspirin?
> 
> > > (cc linux-scsi)
> > > 
> > > > > This is latest 2.6.29-rc1 kernel OLTP performance result. Compare to
> > > > > 2.6.24.2 the regression is around 3.5%.
> > > > > 
> > > > > Linux OLTP Performance summary
> > > > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%  iowait%
> > > > > 2.6.24.2                1.000   21969   43425   76   24     0      0
> > > > > 2.6.27.2                0.973   30402   43523   74   25     0      1
> > > > > 2.6.29-rc1              0.965   30331   41970   74   26     0      0
> > 
> > > But the interrupt rate went through the roof.
> > 
> > Yes.  I forget why that was; I'll have to dig through my archives for
> > that.
> 
> Oh.  I'd have thought that this alone could account for 3.5%.
> 
> > > A 3.5% slowdown in this workload is considered pretty serious, isn't it?
> > 
> > Yes.  Anything above 0.3% is statistically significant.  1% is a big
> > deal.  The fact that we've lost 3.5% in the last year doesn't make
> > people happy.  There's a few things we've identified that have a big
> > effect:
> > 
> >  - Per-partition statistics.  Putting in a sysctl to stop doing them gets
> >    some of that back, but not as much as taking them out (even when
> >    the sysctl'd variable is in a __read_mostly section).  We tried a
> >    patch from Jens to speed up the search for a new partition, but it
> >    had no effect.
> 
> I find this surprising.
> 
> >  - The RT scheduler changes.  They're better for some RT tasks, but not
> >    the database benchmark workload.  Chinang has posted about
> >    this before, but the thread didn't really go anywhere.
> >    http://marc.info/?t=122903815000001&r=1&w=2

I read the whole thread before I found what you were talking about here:

http://marc.info/?l=linux-kernel&m=122937424114658&w=2

With this comment:

"When setting foreground and log writer to rt-prio, the log latency reduced to 4.8ms. \
Performance is about 1.5% higher than the CFS result.  
On a side note, we had been using rt-prio on all DBMS processes and log writer ( in \
higher priority) for the best OLTP performance. That has worked pretty well until \
2.6.25 when the new rt scheduler introduced the pull/push task for lower scheduling \
latency for rt-task. That has negative impact on this workload, probably due to the \
more elaborated load calculation/balancing for hundred of foreground rt-prio \
processes. Also, there is that question of no production environment would run DBMS \
with rt-prio. That is why I am going back to explore CFS and see whether I can drop \
rt-prio for good."

A couple of questions:

1) how does the latest rt scheduler compare?  There has been a lot of improvements.
2) how many rt tasks?
3) what were the prios, producer compared to consumers, not actual numbers
4) have you tried pinning tasks?

RT is more about determinism than performance. The old scheduler
migrated rt tasks the same as other tasks. This helps with performance
because it will keep several rt tasks on the same CPU and cache hot even
when a rt task can migrate. This helps performance, but kills
determinism (I was seeing 10 ms wake up times from the next-highest-prio
task on a cpu, even when another CPU was available).

If you pin a task to a cpu, then it skips over the push and pull logic
and will help with performance too.

-- Steve



> 
> Well.  It's more a case that it wasn't taken anywhere.  I appear to
> have recently been informed that there have never been any
> CPU-scheduler-caused regressions.  Please persist!
> 
> > SLUB would have had a huge negative effect if we were using it -- on the
> > order of 7% iirc.  SLQB is at least performance-neutral with SLAB.
> 
> We really need to unblock that problem somehow.  I assume that
> enterprise distros are shipping slab?
> 


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  2:04       ` Andrew Morton
  2009-01-15  2:27         ` Steven Rostedt
@ 2009-01-15  2:39         ` Andi Kleen
  2009-01-15  2:47           ` Matthew Wilcox
  2009-01-15  7:24         ` Nick Piggin
  2009-01-15 14:12         ` James Bottomley
  3 siblings, 1 reply; 122+ messages in thread
From: Andi Kleen @ 2009-01-15  2:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, Wilcox, Matthew R, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, suresh.b.siddha, harita.chilukuri,
	douglas.w.styner, peter.xihong.wang, hubert.nueckel, chris.mason,
	srostedt, linux-scsi, Andrew Vasquez, Anirban Chakraborty

Andrew Morton <akpm@linux-foundation.org> writes:


>>    some of that back, but not as much as taking them out (even when
>>    the sysctl'd variable is in a __read_mostly section).  We tried a
>>    patch from Jens to speed up the search for a new partition, but it
>>    had no effect.
>
> I find this surprising.

The test system has thousands of disks/LUNs which it writes to
all the time, in addition to a workload which is a real cache pig. 
So any increase in the per LUN overhead directly leads to a lot
more cache misses in the kernel because it increases the working set
there sigificantly.

>
>>  - The RT scheduler changes.  They're better for some RT tasks, but not
>>    the database benchmark workload.  Chinang has posted about
>>    this before, but the thread didn't really go anywhere.
>>    http://marc.info/?t=122903815000001&r=1&w=2
>
> Well.  It's more a case that it wasn't taken anywhere.  I appear to
> have recently been informed that there have never been any
> CPU-scheduler-caused regressions.  Please persist!

Just to clarify: the non RT scheduler has never performed well on this
workload (although it seems to get slightly worse too), mostly 
because of log writer starvation.

RT at some point performed significantly better, but then as 
the RT behaviour was improved to be more fair on MP there were signficant
regressions when running under RT.
I wouldn't really advocate to make RT less fair again, it would
be better to just fix the non RT scheduler to perform reasonably. 
Unfortunately the thread above which was supposed to do that
didn't go anywhere.

>> SLUB would have had a huge negative effect if we were using it -- on the
>> order of 7% iirc.  SLQB is at least performance-neutral with SLAB.
>
> We really need to unblock that problem somehow.  I assume that
> enterprise distros are shipping slab?

The released ones all do.

-Andi
-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  2:39         ` Andi Kleen
@ 2009-01-15  2:47           ` Matthew Wilcox
  2009-01-15  3:36             ` Andi Kleen
  2009-01-20 13:27             ` Jens Axboe
  0 siblings, 2 replies; 122+ messages in thread
From: Matthew Wilcox @ 2009-01-15  2:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, Wilcox, Matthew R, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, suresh.b.siddha, harita.chilukuri,
	douglas.w.styner, peter.xihong.wang, hubert.nueckel, chris.mason,
	srostedt, linux-scsi, Andrew Vasquez, Anirban Chakraborty

On Thu, Jan 15, 2009 at 03:39:05AM +0100, Andi Kleen wrote:
> Andrew Morton <akpm@linux-foundation.org> writes:
> >>    some of that back, but not as much as taking them out (even when
> >>    the sysctl'd variable is in a __read_mostly section).  We tried a
> >>    patch from Jens to speed up the search for a new partition, but it
> >>    had no effect.
> >
> > I find this surprising.
> 
> The test system has thousands of disks/LUNs which it writes to
> all the time, in addition to a workload which is a real cache pig. 
> So any increase in the per LUN overhead directly leads to a lot
> more cache misses in the kernel because it increases the working set
> there sigificantly.

This particular system has 450 spindles, but they're amalgamated into
30 logical volumes by the hardware or firmware.  Linux sees 30 LUNs.
Each one, though, has fifteen partitions on it, so that brings us back
up to 450 partitions.

This system, btw, is a scale model of the full system that would be used
to get published results.  If I remember correctly, a 1% performance
regression on this system is likely to translate to a 2% regression on
the full-scale system.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  2:47           ` Matthew Wilcox
@ 2009-01-15  3:36             ` Andi Kleen
  2009-01-20 13:27             ` Jens Axboe
  1 sibling, 0 replies; 122+ messages in thread
From: Andi Kleen @ 2009-01-15  3:36 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andi Kleen, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

> This particular system has 450 spindles, but they're amalgamated into
> 30 logical volumes by the hardware or firmware.  Linux sees 30 LUNs.
> Each one, though, has fifteen partitions on it, so that brings us back
> up to 450 partitions.

Thanks for the correction.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-15  2:27         ` Steven Rostedt
@ 2009-01-15  7:11           ` Ma, Chinang
  2009-01-19 18:04             ` Chris Mason
  0 siblings, 1 reply; 122+ messages in thread
From: Ma, Chinang @ 2009-01-15  7:11 UTC (permalink / raw)
  To: Steven Rostedt, Andrew Morton
  Cc: Matthew Wilcox, Wilcox, Matthew R, linux-kernel, Tripathi,
	Sharad C, arjan, Kleen, Andi, Siddha, Suresh B, Chilukuri,
	Harita, Styner, Douglas W, Wang, Peter Xihong, Nueckel, Hubert,
	chris.mason, linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Ingo Molnar, Thomas Gleixner, Peter Zijlstra, Gregory Haskins

Trying to answer to some of the question below:

-Chinang

>-----Original Message-----
>From: Steven Rostedt [mailto:srostedt@redhat.com]
>Sent: Wednesday, January 14, 2009 6:27 PM
>To: Andrew Morton
>Cc: Matthew Wilcox; Wilcox, Matthew R; Ma, Chinang; linux-
>kernel@vger.kernel.org; Tripathi, Sharad C; arjan@linux.intel.com; Kleen,
>Andi; Siddha, Suresh B; Chilukuri, Harita; Styner, Douglas W; Wang, Peter
>Xihong; Nueckel, Hubert; chris.mason@oracle.com; linux-scsi@vger.kernel.org;
>Andrew Vasquez; Anirban Chakraborty; Ingo Molnar; Thomas Gleixner; Peter
>Zijlstra; Gregory Haskins
>Subject: Re: Mainline kernel OLTP performance update
>
>(added Ingo, Thomas, Peter and Gregory)
>
>On Wed, 2009-01-14 at 18:04 -0800, Andrew Morton wrote:
>> On Wed, 14 Jan 2009 18:21:47 -0700 Matthew Wilcox <matthew@wil.cx> wrote:
>>
>> > On Wed, Jan 14, 2009 at 04:35:57PM -0800, Andrew Morton wrote:
>> > > On Tue, 13 Jan 2009 15:44:17 -0700
>> > > "Wilcox, Matthew R" <matthew.r.wilcox@intel.com> wrote:
>> > > >
>> > >
>> > > (top-posting repaired.  That @intel.com address is a bad influence ;))
>> >
>> > Alas, that email address goes to an Outlook client.  Not much to be
>done
>> > about that.
>>
>> aspirin?
>>
>> > > (cc linux-scsi)
>> > >
>> > > > > This is latest 2.6.29-rc1 kernel OLTP performance result. Compare
>to
>> > > > > 2.6.24.2 the regression is around 3.5%.
>> > > > >
>> > > > > Linux OLTP Performance summary
>> > > > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%
>iowait%
>> > > > > 2.6.24.2                1.000   21969   43425   76   24     0
>0
>> > > > > 2.6.27.2                0.973   30402   43523   74   25     0
>1
>> > > > > 2.6.29-rc1              0.965   30331   41970   74   26     0
>0
>> >
>> > > But the interrupt rate went through the roof.
>> >
>> > Yes.  I forget why that was; I'll have to dig through my archives for
>> > that.
>>
>> Oh.  I'd have thought that this alone could account for 3.5%.
>>
>> > > A 3.5% slowdown in this workload is considered pretty serious, isn't
>it?
>> >
>> > Yes.  Anything above 0.3% is statistically significant.  1% is a big
>> > deal.  The fact that we've lost 3.5% in the last year doesn't make
>> > people happy.  There's a few things we've identified that have a big
>> > effect:
>> >
>> >  - Per-partition statistics.  Putting in a sysctl to stop doing them
>gets
>> >    some of that back, but not as much as taking them out (even when
>> >    the sysctl'd variable is in a __read_mostly section).  We tried a
>> >    patch from Jens to speed up the search for a new partition, but it
>> >    had no effect.
>>
>> I find this surprising.
>>
>> >  - The RT scheduler changes.  They're better for some RT tasks, but not
>> >    the database benchmark workload.  Chinang has posted about
>> >    this before, but the thread didn't really go anywhere.
>> >    http://marc.info/?t=122903815000001&r=1&w=2
>
>I read the whole thread before I found what you were talking about here:
>
>http://marc.info/?l=linux-kernel&m=122937424114658&w=2
>
>With this comment:
>
>"When setting foreground and log writer to rt-prio, the log latency reduced
>to 4.8ms. \
>Performance is about 1.5% higher than the CFS result.
>On a side note, we had been using rt-prio on all DBMS processes and log
>writer ( in \
>higher priority) for the best OLTP performance. That has worked pretty well
>until \
>2.6.25 when the new rt scheduler introduced the pull/push task for lower
>scheduling \
>latency for rt-task. That has negative impact on this workload, probably
>due to the \
>more elaborated load calculation/balancing for hundred of foreground rt-
>prio \
>processes. Also, there is that question of no production environment would
>run DBMS \
>with rt-prio. That is why I am going back to explore CFS and see whether I
>can drop \
>rt-prio for good."
>

>A couple of questions:
>
>1) how does the latest rt scheduler compare?  There has been a lot of
>improvements.

It is difficult for me to isolate the recent rt scheduler improvement as so many other changes were introduced to the kernel at the same time. A more accurate comparison should just revert the rt-scheduler back to the previous version and test the delta. I am not sure how to get that done. 

>2) how many rt tasks?   
	Around 250 rt tasks.

>3) what were the prios, producer compared to consumers, not actual numbers
	I suppose the single log writer is the main producer (rt-prio 49, higheset rt-prio in this workload) which wake up all foreground process when the log write is done. The 240 foreground processes are the consumer (rt-prio 48). At any given time some number of the 240 foreground was waiting for log writer to finish flushing out the log data.

>4) have you tried pinning tasks?
>
We did try pin foreground rt-process to cpu. That recovered about 1% performance but introduce idle time in some cpu. Without load balancing, my solution is to pin more processes to the idle cpu. I don't think this is a practical solution for the idle time problem as the process distribution need to be adjusted again when upgrade to a different server. 

>RT is more about determinism than performance. The old scheduler
>migrated rt tasks the same as other tasks. This helps with performance
>because it will keep several rt tasks on the same CPU and cache hot even
>when a rt task can migrate. This helps performance, but kills
>determinism (I was seeing 10 ms wake up times from the next-highest-prio
>task on a cpu, even when another CPU was available).
>
>If you pin a task to a cpu, then it skips over the push and pull logic
>and will help with performance too.
>
>-- Steve
>
>
>
>>
>> Well.  It's more a case that it wasn't taken anywhere.  I appear to
>> have recently been informed that there have never been any
>> CPU-scheduler-caused regressions.  Please persist!
>>
>> > SLUB would have had a huge negative effect if we were using it -- on
>the
>> > order of 7% iirc.  SLQB is at least performance-neutral with SLAB.
>>
>> We really need to unblock that problem somehow.  I assume that
>> enterprise distros are shipping slab?
>>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  2:04       ` Andrew Morton
  2009-01-15  2:27         ` Steven Rostedt
  2009-01-15  2:39         ` Andi Kleen
@ 2009-01-15  7:24         ` Nick Piggin
  2009-01-15  9:46           ` Pekka Enberg
  2009-01-16  0:27           ` Andrew Morton
  2009-01-15 14:12         ` James Bottomley
  3 siblings, 2 replies; 122+ messages in thread
From: Nick Piggin @ 2009-01-15  7:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, Wilcox, Matthew R, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

On Thursday 15 January 2009 13:04:31 Andrew Morton wrote:
> On Wed, 14 Jan 2009 18:21:47 -0700 Matthew Wilcox <matthew@wil.cx> wrote:

> > SLUB would have had a huge negative effect if we were using it -- on the
> > order of 7% iirc.  SLQB is at least performance-neutral with SLAB.
>
> We really need to unblock that problem somehow.  I assume that
> enterprise distros are shipping slab?

SLES11 will ship with SLAB, FWIW. As I said in the SLQB thread, this was
not due to my input. But I think it was probably the right choice to make
in that situation.

The biggest problem with SLAB for SGI I think is alien caches bloating the
kmem cache footprint to many GB each on their huge systems, but SLAB has a
parameter to turn off alien caches anyway so I think that is a reasonable
workaround.

Given the OLTP regression, and also I'd hate to have to deal with even
more reports of people's order-N allocations failing... basically with the
regression potential there, I don't think there was a compelling case
found to use SLUB (ie. where does it actually help?).

I'm going to propose to try to unblock the problem by asking to merge SLQB
with a plan to end up picking just one general allocator (and SLOB).

Given that SLAB and SLUB are fairly mature, I wonder what you'd think of
taking SLQB into -mm and making it the default there for a while, to see
if anybody reports a problem?


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  7:24         ` Nick Piggin
@ 2009-01-15  9:46           ` Pekka Enberg
  2009-01-15 13:52             ` Matthew Wilcox
  2009-01-16  0:27           ` Andrew Morton
  1 sibling, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-15  9:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Matthew Wilcox, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty

On Thu, Jan 15, 2009 at 9:24 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> SLES11 will ship with SLAB, FWIW. As I said in the SLQB thread, this was
> not due to my input. But I think it was probably the right choice to make
> in that situation.
>
> The biggest problem with SLAB for SGI I think is alien caches bloating the
> kmem cache footprint to many GB each on their huge systems, but SLAB has a
> parameter to turn off alien caches anyway so I think that is a reasonable
> workaround.
>
> Given the OLTP regression, and also I'd hate to have to deal with even
> more reports of people's order-N allocations failing... basically with the
> regression potential there, I don't think there was a compelling case
> found to use SLUB (ie. where does it actually help?).
>
> I'm going to propose to try to unblock the problem by asking to merge SLQB
> with a plan to end up picking just one general allocator (and SLOB).

It would also be nice if someone could do the performance analysis on
the SLUB bug. I ran sysbench in oltp mode here and the results look
like this:

  [ number of transactions per second from 10 runs. ]

                   min      max      avg      sd
  2.6.29-rc1-slab  833.77   852.32   845.10   4.72
  2.6.29-rc1-slub  823.61   851.94   836.74   8.57

I used the following sysbench parameters:

  sysbench --test=oltp \
         --oltp-table-size=1000000 \
         --mysql-socket=/var/run/mysqld/mysqld.sock \
         prepare

  sysbench --num-threads=16 \
         --max-requests=100000 \
         --test=oltp --oltp-table-size=1000000 \
         --mysql-socket=/var/run/mysqld/mysqld.sock \
         --oltp-read-only run

And no, the numbers are not flipped, SLUB beats SLAB here. :(

		Pekka

$ mysql --version
mysql  Ver 14.12 Distrib 5.0.51a, for debian-linux-gnu (x86_64) using
readline 5.2

$ cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz
stepping	: 6
cpu MHz		: 1000.000
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow
bogomips	: 3989.99
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Core(TM)2 CPU         T7200  @ 2.00GHz
stepping	: 6
cpu MHz		: 1000.000
cache size	: 4096 KB
physical id	: 0
siblings	: 2
core id		: 1
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx
lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor
ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm tpr_shadow
bogomips	: 3990.04
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

$ lspci
00:00.0 Host bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML
and 945GT Express Memory Controller Hub (rev 03)
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS,
943/940GML Express Integrated Graphics Controller (rev 03)
00:02.1 Display controller: Intel Corporation Mobile 945GM/GMS/GME,
943/940GML Express Integrated Graphics Controller (rev 03)
00:07.0 Performance counters: Intel Corporation Unknown device 27a3 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801G (ICH7 Family) High
Definition Audio Controller (rev 02)
00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express
Port 1 (rev 02)
00:1c.1 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express
Port 2 (rev 02)
00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI Controller #1 (rev 02)
00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI Controller #2 (rev 02)
00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI Controller #3 (rev 02)
00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB
UHCI Controller #4 (rev 02)
00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2
EHCI Controller (rev 02)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev e2)
00:1f.0 ISA bridge: Intel Corporation 82801GBM (ICH7-M) LPC Interface
Bridge (rev 02)
00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE
Controller (rev 02)
00:1f.2 IDE interface: Intel Corporation 82801GBM/GHM (ICH7 Family)
SATA IDE Controller (rev 02)
00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 02)
01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053
PCI-E Gigabit Ethernet Controller (rev 22)
02:00.0 Network controller: Atheros Communications Inc. AR5418
802.11abgn Wireless PCI Express Adapter (rev 01)
03:03.0 FireWire (IEEE 1394): Agere Systems FW323 (rev 61)

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  9:46           ` Pekka Enberg
@ 2009-01-15 13:52             ` Matthew Wilcox
  2009-01-15 14:42               ` Pekka Enberg
  2009-01-16 10:16               ` Pekka Enberg
  0 siblings, 2 replies; 122+ messages in thread
From: Matthew Wilcox @ 2009-01-15 13:52 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nick Piggin, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty

On Thu, Jan 15, 2009 at 11:46:09AM +0200, Pekka Enberg wrote:
> It would also be nice if someone could do the performance analysis on
> the SLUB bug. I ran sysbench in oltp mode here and the results look
> like this:
> 
>   [ number of transactions per second from 10 runs. ]
> 
>                    min      max      avg      sd
>   2.6.29-rc1-slab  833.77   852.32   845.10   4.72
>   2.6.29-rc1-slub  823.61   851.94   836.74   8.57
> 
> And no, the numbers are not flipped, SLUB beats SLAB here. :(

Um.  More transactions per second is good.  Your numbers show SLAB
beating SLUB (even on your dual-CPU system).  And SLAB shows a lower
standard deviation, which is also good.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  2:04       ` Andrew Morton
                           ` (2 preceding siblings ...)
  2009-01-15  7:24         ` Nick Piggin
@ 2009-01-15 14:12         ` James Bottomley
  2009-01-15 17:44           ` Andrew Morton
  3 siblings, 1 reply; 122+ messages in thread
From: James Bottomley @ 2009-01-15 14:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox, Wilcox, Matthew R, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

On Wed, 2009-01-14 at 18:04 -0800, Andrew Morton wrote:
> On Wed, 14 Jan 2009 18:21:47 -0700 Matthew Wilcox <matthew@wil.cx> wrote:
> > On Wed, Jan 14, 2009 at 04:35:57PM -0800, Andrew Morton wrote:
> > > > > Linux OLTP Performance summary
> > > > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%  iowait%
> > > > > 2.6.24.2                1.000   21969   43425   76   24     0      0
> > > > > 2.6.27.2                0.973   30402   43523   74   25     0      1
> > > > > 2.6.29-rc1              0.965   30331   41970   74   26     0      0
> > 
> > > But the interrupt rate went through the roof.
> > 
> > Yes.  I forget why that was; I'll have to dig through my archives for
> > that.
> 
> Oh.  I'd have thought that this alone could account for 3.5%.

Me too.  Anecdotally, I haven't noticed this in my lab machines, but
what I have noticed is on someone else's laptop (a hyperthreaded atom)
that I was trying to demo powertop on was that IPI reschedule interrupts
seem to be out of control ... they were ticking over at a really high
rate and preventing the CPU from spending much time in the low C and P
states.  To me this implicates some scheduler problem since that's the
primary producer of IPI reschedules ... I think it wouldn't be a
significant extrapolation to predict that the scheduler might be the
cause of the above problem as well.

James



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15 13:52             ` Matthew Wilcox
@ 2009-01-15 14:42               ` Pekka Enberg
  2009-01-16 10:16               ` Pekka Enberg
  1 sibling, 0 replies; 122+ messages in thread
From: Pekka Enberg @ 2009-01-15 14:42 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Nick Piggin, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty

Matthew Wilcox wrote:
> On Thu, Jan 15, 2009 at 11:46:09AM +0200, Pekka Enberg wrote:
>> It would also be nice if someone could do the performance analysis on
>> the SLUB bug. I ran sysbench in oltp mode here and the results look
>> like this:
>>
>>   [ number of transactions per second from 10 runs. ]
>>
>>                    min      max      avg      sd
>>   2.6.29-rc1-slab  833.77   852.32   845.10   4.72
>>   2.6.29-rc1-slub  823.61   851.94   836.74   8.57
>>
>> And no, the numbers are not flipped, SLUB beats SLAB here. :(
> 
> Um.  More transactions per second is good.  Your numbers show SLAB
> beating SLUB (even on your dual-CPU system).  And SLAB shows a lower
> standard deviation, which is also good.

*blush*

Will do oprofile tomorrow. Thanks Matthew.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-15  1:21     ` Matthew Wilcox
  2009-01-15  2:04       ` Andrew Morton
@ 2009-01-15 16:48       ` Ma, Chinang
  1 sibling, 0 replies; 122+ messages in thread
From: Ma, Chinang @ 2009-01-15 16:48 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton
  Cc: Wilcox, Matthew R, linux-kernel, Tripathi, Sharad C, arjan,
	Kleen, Andi, Siddha, Suresh B, Chilukuri, Harita, Styner,
	Douglas W, Wang, Peter Xihong, Nueckel, Hubert, chris.mason,
	srostedt, linux-scsi, Andrew Vasquez, Anirban Chakraborty



>-----Original Message-----
>From: Matthew Wilcox [mailto:matthew@wil.cx]
>Sent: Wednesday, January 14, 2009 5:22 PM
>To: Andrew Morton
>Cc: Wilcox, Matthew R; Ma, Chinang; linux-kernel@vger.kernel.org; Tripathi,
>Sharad C; arjan@linux.intel.com; Kleen, Andi; Siddha, Suresh B; Chilukuri,
>Harita; Styner, Douglas W; Wang, Peter Xihong; Nueckel, Hubert;
>chris.mason@oracle.com; srostedt@redhat.com; linux-scsi@vger.kernel.org;
>Andrew Vasquez; Anirban Chakraborty
>Subject: Re: Mainline kernel OLTP performance update
>
>On Wed, Jan 14, 2009 at 04:35:57PM -0800, Andrew Morton wrote:
>> On Tue, 13 Jan 2009 15:44:17 -0700
>> "Wilcox, Matthew R" <matthew.r.wilcox@intel.com> wrote:
>> >
>>
>> (top-posting repaired.  That @intel.com address is a bad influence ;))
>
>Alas, that email address goes to an Outlook client.  Not much to be done
>about that.
>
>> (cc linux-scsi)
>>
>> > > This is latest 2.6.29-rc1 kernel OLTP performance result. Compare to
>> > > 2.6.24.2 the regression is around 3.5%.
>> > >
>> > > Linux OLTP Performance summary
>> > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%
>iowait%
>> > > 2.6.24.2                1.000   21969   43425   76   24     0      0
>> > > 2.6.27.2                0.973   30402   43523   74   25     0      1
>> > > 2.6.29-rc1              0.965   30331   41970   74   26     0      0
>
>> But the interrupt rate went through the roof.
>
>Yes.  I forget why that was; I'll have to dig through my archives for
>that.

I took a quick look at the interrupts figure between 2.6.24 and 2.6.27. i/o interuputs is slightly down in 2.6.27 (due to reduce throughput). But both NMI and reschedule interrupt increased.  Reschedule interrupts is 2x of 2.6.24.

>
>> A 3.5% slowdown in this workload is considered pretty serious, isn't it?
>
>Yes.  Anything above 0.3% is statistically significant.  1% is a big
>deal.  The fact that we've lost 3.5% in the last year doesn't make
>people happy.  There's a few things we've identified that have a big
>effect:
>
> - Per-partition statistics.  Putting in a sysctl to stop doing them gets
>   some of that back, but not as much as taking them out (even when
>   the sysctl'd variable is in a __read_mostly section).  We tried a
>   patch from Jens to speed up the search for a new partition, but it
>   had no effect.
>
> - The RT scheduler changes.  They're better for some RT tasks, but not
>   the database benchmark workload.  Chinang has posted about
>   this before, but the thread didn't really go anywhere.
>   http://marc.info/?t=122903815000001&r=1&w=2
>
>SLUB would have had a huge negative effect if we were using it -- on the
>order of 7% iirc.  SLQB is at least performance-neutral with SLAB.
>
>--
>Matthew Wilcox				Intel Open Source Technology Centre
>"Bill, look, we understand that you're interested in selling us this
>operating system, but compare it to ours.  We can't possibly take such
>a retrograde step."

-Chinang

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15 14:12         ` James Bottomley
@ 2009-01-15 17:44           ` Andrew Morton
  2009-01-15 18:00             ` Matthew Wilcox
  0 siblings, 1 reply; 122+ messages in thread
From: Andrew Morton @ 2009-01-15 17:44 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthew Wilcox, Wilcox, Matthew R, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

On Thu, 15 Jan 2009 09:12:46 -0500 James Bottomley <James.Bottomley@HansenPartnership.com> wrote:

> On Wed, 2009-01-14 at 18:04 -0800, Andrew Morton wrote:
> > On Wed, 14 Jan 2009 18:21:47 -0700 Matthew Wilcox <matthew@wil.cx> wrote:
> > > On Wed, Jan 14, 2009 at 04:35:57PM -0800, Andrew Morton wrote:
> > > > > > Linux OLTP Performance summary
> > > > > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%  iowait%
> > > > > > 2.6.24.2                1.000   21969   43425   76   24     0      0
> > > > > > 2.6.27.2                0.973   30402   43523   74   25     0      1
> > > > > > 2.6.29-rc1              0.965   30331   41970   74   26     0      0
> > > 
> > > > But the interrupt rate went through the roof.
> > > 
> > > Yes.  I forget why that was; I'll have to dig through my archives for
> > > that.
> > 
> > Oh.  I'd have thought that this alone could account for 3.5%.
> 
> Me too.  Anecdotally, I haven't noticed this in my lab machines, but
> what I have noticed is on someone else's laptop (a hyperthreaded atom)
> that I was trying to demo powertop on was that IPI reschedule interrupts
> seem to be out of control ... they were ticking over at a really high
> rate and preventing the CPU from spending much time in the low C and P
> states.  To me this implicates some scheduler problem since that's the
> primary producer of IPI reschedules ... I think it wouldn't be a
> significant extrapolation to predict that the scheduler might be the
> cause of the above problem as well.
> 

Good point.

The context switch rate actually went down a bit.

I wonder if the Intel test people have records of /proc/interrupts for
the various kernel versions.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15 17:44           ` Andrew Morton
@ 2009-01-15 18:00             ` Matthew Wilcox
  2009-01-15 18:14               ` Steven Rostedt
  0 siblings, 1 reply; 122+ messages in thread
From: Matthew Wilcox @ 2009-01-15 18:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Wilcox, Matthew R, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

On Thu, Jan 15, 2009 at 09:44:42AM -0800, Andrew Morton wrote:
> > Me too.  Anecdotally, I haven't noticed this in my lab machines, but
> > what I have noticed is on someone else's laptop (a hyperthreaded atom)
> > that I was trying to demo powertop on was that IPI reschedule interrupts
> > seem to be out of control ... they were ticking over at a really high
> > rate and preventing the CPU from spending much time in the low C and P
> > states.  To me this implicates some scheduler problem since that's the
> > primary producer of IPI reschedules ... I think it wouldn't be a
> > significant extrapolation to predict that the scheduler might be the
> > cause of the above problem as well.
> > 
> 
> Good point.
> 
> The context switch rate actually went down a bit.
> 
> I wonder if the Intel test people have records of /proc/interrupts for
> the various kernel versions.

I think Chinang does, but he's out of office today.  He did say in an
earlier reply:

> I took a quick look at the interrupts figure between 2.6.24 and 2.6.27.
> i/o interuputs is slightly down in 2.6.27 (due to reduce throughput).
> But both NMI and reschedule interrupt increased.  Reschedule interrupts
> is 2x of 2.6.24.

So if the reschedule interrupt is happening twice as often, and the
context switch rate is basically unchanged, I guess that means the
scheduler is doing a lot more work to get approximately the same
results.  And that seems like a bad thing.

Again, it's worth bearing in mind that these are all RT tasks, so the
underlying problem may be very different from the one that both James and
I have observed with an Atom laptop running predominantly non-RT tasks.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15 18:00             ` Matthew Wilcox
@ 2009-01-15 18:14               ` Steven Rostedt
  2009-01-15 18:44                 ` Gregory Haskins
  2009-01-15 19:28                 ` Ma, Chinang
  0 siblings, 2 replies; 122+ messages in thread
From: Steven Rostedt @ 2009-01-15 18:14 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, James Bottomley, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty, Gregory Haskins


On Thu, 2009-01-15 at 11:00 -0700, Matthew Wilcox wrote:
> On Thu, Jan 15, 2009 at 09:44:42AM -0800, Andrew Morton wrote:
> > > Me too.  Anecdotally, I haven't noticed this in my lab machines, but
> > > what I have noticed is on someone else's laptop (a hyperthreaded atom)
> > > that I was trying to demo powertop on was that IPI reschedule interrupts
> > > seem to be out of control ... they were ticking over at a really high
> > > rate and preventing the CPU from spending much time in the low C and P
> > > states.  To me this implicates some scheduler problem since that's the
> > > primary producer of IPI reschedules ... I think it wouldn't be a
> > > significant extrapolation to predict that the scheduler might be the
> > > cause of the above problem as well.
> > > 
> > 
> > Good point.
> > 
> > The context switch rate actually went down a bit.
> > 
> > I wonder if the Intel test people have records of /proc/interrupts for
> > the various kernel versions.
> 
> I think Chinang does, but he's out of office today.  He did say in an
> earlier reply:
> 
> > I took a quick look at the interrupts figure between 2.6.24 and 2.6.27.
> > i/o interuputs is slightly down in 2.6.27 (due to reduce throughput).
> > But both NMI and reschedule interrupt increased.  Reschedule interrupts
> > is 2x of 2.6.24.
> 
> So if the reschedule interrupt is happening twice as often, and the
> context switch rate is basically unchanged, I guess that means the
> scheduler is doing a lot more work to get approximately the same
> results.  And that seems like a bad thing.
> 
> Again, it's worth bearing in mind that these are all RT tasks, so the
> underlying problem may be very different from the one that both James and
> I have observed with an Atom laptop running predominantly non-RT tasks.
> 

The RT scheduler is a bit more aggressive than it use to be. It use to
just migrate RT tasks when the migration thread woke up, and did that in
"bulk".  Now, when an individual RT task wakes up and it can not run on
the current CPU but can on another CPU, it is scheduled immediately, and
an IPI is sent out.

As for context switching, it would be the same amount as before, but the
difference is that the RT task will try to wake up as soon as possible.
This also causes RT tasks to bounce around CPUs more often.

If there are many threads, they should not be RT, unless there is some
design behind it.

Forgive me if you already did this and said so, but what is the result
of just making the writer an RT task and keeping all the readers as
SCHED_OTHER?

-- Steve



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15 18:14               ` Steven Rostedt
@ 2009-01-15 18:44                 ` Gregory Haskins
  2009-01-15 18:46                   ` Wilcox, Matthew R
  2009-01-15 19:28                 ` Ma, Chinang
  1 sibling, 1 reply; 122+ messages in thread
From: Gregory Haskins @ 2009-01-15 18:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Matthew Wilcox, Andrew Morton, James Bottomley, Wilcox,
	Matthew R, chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	andi.kleen, suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

[-- Attachment #1: Type: text/plain, Size: 2605 bytes --]

Steven Rostedt wrote:
> On Thu, 2009-01-15 at 11:00 -0700, Matthew Wilcox wrote:
>   
>> On Thu, Jan 15, 2009 at 09:44:42AM -0800, Andrew Morton wrote:
>>     
>>>> Me too.  Anecdotally, I haven't noticed this in my lab machines, but
>>>> what I have noticed is on someone else's laptop (a hyperthreaded atom)
>>>> that I was trying to demo powertop on was that IPI reschedule interrupts
>>>> seem to be out of control ... they were ticking over at a really high
>>>> rate and preventing the CPU from spending much time in the low C and P
>>>> states.  To me this implicates some scheduler problem since that's the
>>>> primary producer of IPI reschedules ... I think it wouldn't be a
>>>> significant extrapolation to predict that the scheduler might be the
>>>> cause of the above problem as well.
>>>>
>>>>         
>>> Good point.
>>>
>>> The context switch rate actually went down a bit.
>>>
>>> I wonder if the Intel test people have records of /proc/interrupts for
>>> the various kernel versions.
>>>       
>> I think Chinang does, but he's out of office today.  He did say in an
>> earlier reply:
>>
>>     
>>> I took a quick look at the interrupts figure between 2.6.24 and 2.6.27.
>>> i/o interuputs is slightly down in 2.6.27 (due to reduce throughput).
>>> But both NMI and reschedule interrupt increased.  Reschedule interrupts
>>> is 2x of 2.6.24.
>>>       
>> So if the reschedule interrupt is happening twice as often, and the
>> context switch rate is basically unchanged, I guess that means the
>> scheduler is doing a lot more work to get approximately the same
>> results.  And that seems like a bad thing.
>>     

I would be very interested in gathering some data in this area.  One
thing that pops to mind is to instrument the resched-ipi with
ftrace_printk() and gather a trace of this system in action.  I assume
that I wouldn't have access to this OLTP suite, so I may need a
volunteer to try this for me.  I could put together an instrumentation
patch for the testers convenience if they prefer.

Another data-point I wouldn't mind seeing is  looking at the scheduler
statistics, particularly with my sched-top utility, which you can find here:

http://rt.wiki.kernel.org/index.php/Schedtop_utility

(Note you may want to exclude the sched_info stats, as they are
inherently noisy and make it hard to see the real trends.  To do this
run it with: 'schedtop -x "sched_info"'

In the meantime, I will try similar approaches here on other non-OLTP
based workloads to see if I spy anything that looks amiss.
 
-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-15 18:44                 ` Gregory Haskins
@ 2009-01-15 18:46                   ` Wilcox, Matthew R
  2009-01-15 19:44                     ` Ma, Chinang
  0 siblings, 1 reply; 122+ messages in thread
From: Wilcox, Matthew R @ 2009-01-15 18:46 UTC (permalink / raw)
  To: Gregory Haskins, Steven Rostedt
  Cc: Matthew Wilcox, Andrew Morton, James Bottomley, Ma, Chinang,
	linux-kernel, Tripathi, Sharad C, arjan, Kleen, Andi, Siddha,
	Suresh B, Chilukuri, Harita, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, chris.mason, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1342 bytes --]

Gregory Haskins [mailto:ghaskins@novell.com] wrote:
> > On Thu, 2009-01-15 at 11:00 -0700, Matthew Wilcox wrote:
> >> So if the reschedule interrupt is happening twice as often, and the
> >> context switch rate is basically unchanged, I guess that means the
> >> scheduler is doing a lot more work to get approximately the same
> >> results.  And that seems like a bad thing.
> 
> I would be very interested in gathering some data in this area.  One
> thing that pops to mind is to instrument the resched-ipi with
> ftrace_printk() and gather a trace of this system in action.  I assume
> that I wouldn't have access to this OLTP suite, so I may need a
> volunteer to try this for me.  I could put together an instrumentation
> patch for the testers convenience if they prefer.

I don't know whether Novell have an arrangement with the Well-Known Commercial Database and the Well-Known OLTP Benchmark to do runs like this.  Chinang is normally only too happy to build his own kernels with patches from people who are interested in helping, so that's probably the best way to do it.

I'm leaving for LCA in an hour or so, so further responses from me to this thread are unlikely ;-)
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-15 18:14               ` Steven Rostedt
  2009-01-15 18:44                 ` Gregory Haskins
@ 2009-01-15 19:28                 ` Ma, Chinang
  1 sibling, 0 replies; 122+ messages in thread
From: Ma, Chinang @ 2009-01-15 19:28 UTC (permalink / raw)
  To: Steven Rostedt, Matthew Wilcox
  Cc: Andrew Morton, James Bottomley, Wilcox, Matthew R, linux-kernel,
	Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B,
	Chilukuri, Harita, Styner, Douglas W, Wang, Peter Xihong,
	Nueckel, Hubert, chris.mason, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty, Gregory Haskins



>-----Original Message-----
>From: Steven Rostedt [mailto:srostedt@redhat.com]
>Sent: Thursday, January 15, 2009 10:15 AM
>To: Matthew Wilcox
>Cc: Andrew Morton; James Bottomley; Wilcox, Matthew R; Ma, Chinang; linux-
>kernel@vger.kernel.org; Tripathi, Sharad C; arjan@linux.intel.com; Kleen,
>Andi; Siddha, Suresh B; Chilukuri, Harita; Styner, Douglas W; Wang, Peter
>Xihong; Nueckel, Hubert; chris.mason@oracle.com; linux-scsi@vger.kernel.org;
>Andrew Vasquez; Anirban Chakraborty; Gregory Haskins
>Subject: Re: Mainline kernel OLTP performance update
>
>
>On Thu, 2009-01-15 at 11:00 -0700, Matthew Wilcox wrote:
>> On Thu, Jan 15, 2009 at 09:44:42AM -0800, Andrew Morton wrote:
>> > > Me too.  Anecdotally, I haven't noticed this in my lab machines, but
>> > > what I have noticed is on someone else's laptop (a hyperthreaded atom)
>> > > that I was trying to demo powertop on was that IPI reschedule
>interrupts
>> > > seem to be out of control ... they were ticking over at a really high
>> > > rate and preventing the CPU from spending much time in the low C and
>P
>> > > states.  To me this implicates some scheduler problem since that's
>the
>> > > primary producer of IPI reschedules ... I think it wouldn't be a
>> > > significant extrapolation to predict that the scheduler might be the
>> > > cause of the above problem as well.
>> > >
>> >
>> > Good point.
>> >
>> > The context switch rate actually went down a bit.
>> >
>> > I wonder if the Intel test people have records of /proc/interrupts for
>> > the various kernel versions.
>>
>> I think Chinang does, but he's out of office today.  He did say in an
>> earlier reply:
>>
>> > I took a quick look at the interrupts figure between 2.6.24 and 2.6.27.
>> > i/o interuputs is slightly down in 2.6.27 (due to reduce throughput).
>> > But both NMI and reschedule interrupt increased.  Reschedule interrupts
>> > is 2x of 2.6.24.
>>
>> So if the reschedule interrupt is happening twice as often, and the
>> context switch rate is basically unchanged, I guess that means the
>> scheduler is doing a lot more work to get approximately the same
>> results.  And that seems like a bad thing.
>>
>> Again, it's worth bearing in mind that these are all RT tasks, so the
>> underlying problem may be very different from the one that both James and
>> I have observed with an Atom laptop running predominantly non-RT tasks.
>>
>
>The RT scheduler is a bit more aggressive than it use to be. It use to
>just migrate RT tasks when the migration thread woke up, and did that in
>"bulk".  Now, when an individual RT task wakes up and it can not run on
>the current CPU but can on another CPU, it is scheduled immediately, and
>an IPI is sent out.
>
>As for context switching, it would be the same amount as before, but the
>difference is that the RT task will try to wake up as soon as possible.
>This also causes RT tasks to bounce around CPUs more often.
>
>If there are many threads, they should not be RT, unless there is some
>design behind it.
>
>Forgive me if you already did this and said so, but what is the result
>of just making the writer an RT task and keeping all the readers as
>SCHED_OTHER?
>
>-- Steve
>

I think the high OLTP throughtput with rt-prio is due to the fixed time-slice. It is better to give DBMS process a bigger timeslice for getting a data buffer lock, process data, release the lock and switch out due to waiting on i/o instead of being force to switch out while still holding a data lock. 

I suppose SCHED_OTHER is the default policy for user processes. We tried setting only the log writer to RT and left all other DBMS orocess in default sched policy and the performance is ~1.5% lower than the all rt-prio process result.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-15 18:46                   ` Wilcox, Matthew R
@ 2009-01-15 19:44                     ` Ma, Chinang
  2009-01-16 18:14                       ` Gregory Haskins
  0 siblings, 1 reply; 122+ messages in thread
From: Ma, Chinang @ 2009-01-15 19:44 UTC (permalink / raw)
  To: Wilcox, Matthew R, Gregory Haskins, Steven Rostedt
  Cc: Matthew Wilcox, Andrew Morton, James Bottomley, linux-kernel,
	Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B,
	Chilukuri, Harita, Styner, Douglas W, Wang, Peter Xihong,
	Nueckel, Hubert, chris.mason, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty

Gregory. 
I will test the resched-ipi instrumentation patch with our OLTP if you can post the patch and some instructions.
Thanks,
-Chinang

>-----Original Message-----
>From: Wilcox, Matthew R
>Sent: Thursday, January 15, 2009 10:47 AM
>To: Gregory Haskins; Steven Rostedt
>Cc: Matthew Wilcox; Andrew Morton; James Bottomley; Ma, Chinang; linux-
>kernel@vger.kernel.org; Tripathi, Sharad C; arjan@linux.intel.com; Kleen,
>Andi; Siddha, Suresh B; Chilukuri, Harita; Styner, Douglas W; Wang, Peter
>Xihong; Nueckel, Hubert; chris.mason@oracle.com; linux-scsi@vger.kernel.org;
>Andrew Vasquez; Anirban Chakraborty
>Subject: RE: Mainline kernel OLTP performance update
>
>Gregory Haskins [mailto:ghaskins@novell.com] wrote:
>> > On Thu, 2009-01-15 at 11:00 -0700, Matthew Wilcox wrote:
>> >> So if the reschedule interrupt is happening twice as often, and the
>> >> context switch rate is basically unchanged, I guess that means the
>> >> scheduler is doing a lot more work to get approximately the same
>> >> results.  And that seems like a bad thing.
>>
>> I would be very interested in gathering some data in this area.  One
>> thing that pops to mind is to instrument the resched-ipi with
>> ftrace_printk() and gather a trace of this system in action.  I assume
>> that I wouldn't have access to this OLTP suite, so I may need a
>> volunteer to try this for me.  I could put together an instrumentation
>> patch for the testers convenience if they prefer.
>
>I don't know whether Novell have an arrangement with the Well-Known
>Commercial Database and the Well-Known OLTP Benchmark to do runs like this.
>Chinang is normally only too happy to build his own kernels with patches
>from people who are interested in helping, so that's probably the best way
>to do it.
>
>I'm leaving for LCA in an hour or so, so further responses from me to this
>thread are unlikely ;-)

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  7:24         ` Nick Piggin
  2009-01-15  9:46           ` Pekka Enberg
@ 2009-01-16  0:27           ` Andrew Morton
  2009-01-16  4:03             ` Nick Piggin
  1 sibling, 1 reply; 122+ messages in thread
From: Andrew Morton @ 2009-01-16  0:27 UTC (permalink / raw)
  To: Nick Piggin
  Cc: matthew, matthew.r.wilcox, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Thu, 15 Jan 2009 18:24:36 +1100
Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> Given that SLAB and SLUB are fairly mature, I wonder what you'd think of
> taking SLQB into -mm and making it the default there for a while, to see
> if anybody reports a problem?

Nobody would test it in interesting ways.

We'd get more testing in linux-next, but still not enough, and not of
the right type.

It would be better to just make the desision, merge it and forge ahead.

Me, I'd be 100% behind the idea if it had a credible prospect of a net
reduction in the number of slab allocator implementations.

I guess the naming convention will limit us to 26 of them.  Fortunate
indeed that the kernel isn't written in cyrillic!



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  0:27           ` Andrew Morton
@ 2009-01-16  4:03             ` Nick Piggin
  2009-01-16  4:12               ` Andrew Morton
  0 siblings, 1 reply; 122+ messages in thread
From: Nick Piggin @ 2009-01-16  4:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: matthew, matthew.r.wilcox, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Friday 16 January 2009 11:27:35 Andrew Morton wrote:
> On Thu, 15 Jan 2009 18:24:36 +1100
>
> Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > Given that SLAB and SLUB are fairly mature, I wonder what you'd think of
> > taking SLQB into -mm and making it the default there for a while, to see
> > if anybody reports a problem?
>
> Nobody would test it in interesting ways.
>
> We'd get more testing in linux-next, but still not enough, and not of
> the right type.

It would be better than nothing, for SLQB, I guess.


> It would be better to just make the desision, merge it and forge ahead.
>
> Me, I'd be 100% behind the idea if it had a credible prospect of a net
> reduction in the number of slab allocator implementations.

>From the data we have so far, I think SLQB is a "credible prospect" to
replace SLUB and SLAB. But then again, apparently SLUB was a credible
prospect to replace SLAB when it was merged.

Unfortunately I can't honestly say that some serious regression will not
be discovered in SLQB that cannot be fixed. I guess that's never stopped
us merging other rewrites before, though.

I would like to see SLQB merged in mainline, made default, and wait for
some number releases. Then we take what we know, and try to make an
informed decision about the best one to take. I guess that is problematic
in that the rest of the kernel is moving underneath us. Do you have
another idea?


> I guess the naming convention will limit us to 26 of them.  Fortunate
> indeed that the kernel isn't written in cyrillic!

I could have called it SL4B. 4 would be somehow fitting...


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  4:03             ` Nick Piggin
@ 2009-01-16  4:12               ` Andrew Morton
  2009-01-16  6:46                 ` Nick Piggin
  0 siblings, 1 reply; 122+ messages in thread
From: Andrew Morton @ 2009-01-16  4:12 UTC (permalink / raw)
  To: Nick Piggin
  Cc: matthew, matthew.r.wilcox, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Fri, 16 Jan 2009 15:03:12 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> I would like to see SLQB merged in mainline, made default, and wait for
> some number releases. Then we take what we know, and try to make an
> informed decision about the best one to take. I guess that is problematic
> in that the rest of the kernel is moving underneath us. Do you have
> another idea?

Nope.  If it doesn't work out, we can remove it again I guess.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  4:12               ` Andrew Morton
@ 2009-01-16  6:46                 ` Nick Piggin
  2009-01-16  6:55                   ` Matthew Wilcox
                                     ` (2 more replies)
  0 siblings, 3 replies; 122+ messages in thread
From: Nick Piggin @ 2009-01-16  6:46 UTC (permalink / raw)
  To: Andrew Morton, netdev, sfr
  Cc: matthew, matthew.r.wilcox, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Friday 16 January 2009 15:12:10 Andrew Morton wrote:
> On Fri, 16 Jan 2009 15:03:12 +1100 Nick Piggin <nickpiggin@yahoo.com.au> 
wrote:
> > I would like to see SLQB merged in mainline, made default, and wait for
> > some number releases. Then we take what we know, and try to make an
> > informed decision about the best one to take. I guess that is problematic
> > in that the rest of the kernel is moving underneath us. Do you have
> > another idea?
>
> Nope.  If it doesn't work out, we can remove it again I guess.

OK, I have these numbers to show I'm not completely off my rocker to suggest
we merge SLQB :) Given these results, how about I ask to merge SLQB as default
in linux-next, then if nothing catastrophic happens, merge it upstream in the
next merge window, then a couple of releases after that, given some time to
test and tweak SLQB, then we plan to bite the bullet and emerge with just one
main slab allocator (plus SLOB).


System is a 2socket, 4 core AMD. All debug and stats options turned off for
all the allocators; default parameters (ie. SLUB using higher order pages,
and the others tend to be using order-0). SLQB is the version I recently
posted, with some of the prefetching removed according to Pekka's review
(probably a good idea to only add things like that in if/when they prove to
be an improvement).

time fio examples/netio (10 runs, lower better):
SLAB AVG=13.19 STD=0.40
SLQB AVG=13.78 STD=0.24
SLUB AVG=14.47 STD=0.23

SLAB makes a good showing here. The allocation/freeing pattern seems to be
very regular and easy (fast allocs and frees). So it could be some "lucky"
caching behaviour, I'm not exactly sure. I'll have to run more tests and
profiles here.


hackbench (10 runs, lower better):
1 GROUP
SLAB AVG=1.34 STD=0.05
SLQB AVG=1.31 STD=0.06
SLUB AVG=1.46 STD=0.07

2 GROUPS
SLAB AVG=1.20 STD=0.09
SLQB AVG=1.22 STD=0.12
SLUB AVG=1.21 STD=0.06

4 GROUPS
SLAB AVG=0.84 STD=0.05
SLQB AVG=0.81 STD=0.10
SLUB AVG=0.98 STD=0.07

8 GROUPS
SLAB AVG=0.79 STD=0.10
SLQB AVG=0.76 STD=0.15
SLUB AVG=0.89 STD=0.08

16 GROUPS
SLAB AVG=0.78 STD=0.08
SLQB AVG=0.79 STD=0.10
SLUB AVG=0.86 STD=0.05

32 GROUPS
SLAB AVG=0.86 STD=0.05
SLQB AVG=0.78 STD=0.06
SLUB AVG=0.88 STD=0.06

64 GROUPS
SLAB AVG=1.03 STD=0.05
SLQB AVG=0.90 STD=0.04
SLUB AVG=1.05 STD=0.06

128 GROUPS
SLAB AVG=1.31 STD=0.19
SLQB AVG=1.16 STD=0.36
SLUB AVG=1.29 STD=0.11

SLQB tends to be the winner here. SLAB is close at lower numbers of
groups, but drops behind a bit more as they increase.


tbench (10 runs, higher better):
1 THREAD
SLAB AVG=239.25 STD=31.74
SLQB AVG=257.75 STD=33.89
SLUB AVG=223.02 STD=14.73

2 THREADS
SLAB AVG=649.56 STD=9.77
SLQB AVG=647.77 STD=7.48
SLUB AVG=634.50 STD=7.66

4 THREADS
SLAB AVG=1294.52 STD=13.19
SLQB AVG=1266.58 STD=35.71
SLUB AVG=1228.31 STD=48.08

8 THREADS
SLAB AVG=2750.78 STD=26.67
SLQB AVG=2758.90 STD=18.86
SLUB AVG=2685.59 STD=22.41

16 THREADS
SLAB AVG=2669.11 STD=58.34
SLQB AVG=2671.69 STD=31.84
SLUB AVG=2571.05 STD=45.39

SLAB and SLQB seem to be pretty close, winning some and losing some.
They're always within a standard deviation of one another, so we can't
make conclusions between them. SLUB seems to be a bit slower.


Netperf UDP unidirectional send test (10 runs, higher better):

Server and client bound to same CPU
SLAB AVG=60.111 STD=1.59382
SLQB AVG=60.167 STD=0.685347
SLUB AVG=58.277 STD=0.788328

Server and client bound to same socket, different CPUs
SLAB AVG=85.938 STD=0.875794
SLQB AVG=93.662 STD=2.07434
SLUB AVG=81.983 STD=0.864362

Server and client bound to different sockets
SLAB AVG=78.801 STD=1.44118
SLQB AVG=78.269 STD=1.10457
SLUB AVG=71.334 STD=1.16809

SLQB is up with SLAB for the first and last cases, and faster in
the second case. SLUB trails in each case. (Any ideas for better types
of netperf tests?)


Kbuild numbers don't seem to be significantly different. SLAB and SLQB
actually got exactly the same average over 10 runs. The user+sys times
tend to be almost identical between allocators, with elapsed time mainly
depending on how much time the CPU was not idle.


Intel's OLTP shows SLQB is "neutral" to SLAB. That is, literally within
their measurement confidence interval. If it comes down to it, I think we
could get them to do more runs to narrow that down, but we're talking a
couple of tenths of a percent already.


I haven't done any non-local network tests. Networking is the one of the
subsystems most heavily dependent on slab performance, so if anybody
cares to run their favourite tests, that would be really helpful.

Disclaimer
----------
Now remember this is just one specific HW configuration, and some
allocators for some reason give significantly (and sometimes perplexingly)
different results between different CPU and system architectures.

The other frustrating thing is that sometimes you happen to get a lucky
or unlucky cache or NUMA layout depending on the compile, the boot, etc.
So sometimes results get a little "skewed" in a way that isn't reflected
in the STDDEV. But I've tried to minimise that. Dropping caches and
restarting services etc. between individual runs.



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  6:46                 ` Nick Piggin
@ 2009-01-16  6:55                   ` Matthew Wilcox
  2009-01-16  7:06                     ` Nick Piggin
  2009-01-16  7:53                     ` Zhang, Yanmin
  2009-01-16  7:00                   ` Mainline kernel OLTP performance update Andrew Morton
  2009-01-16 18:11                   ` Rick Jones
  2 siblings, 2 replies; 122+ messages in thread
From: Matthew Wilcox @ 2009-01-16  6:55 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Zhang, Yanmin

On Fri, Jan 16, 2009 at 05:46:23PM +1100, Nick Piggin wrote:
> Intel's OLTP shows SLQB is "neutral" to SLAB. That is, literally within
> their measurement confidence interval. If it comes down to it, I think we
> could get them to do more runs to narrow that down, but we're talking a
> couple of tenths of a percent already.

I think I can speak with some measure of confidence for at least the
OLTP-testing part of my company when I say that I have no objection to
Nick's planned merge scheme.

I believe the kernel benchmark group have also done some testing with
SLQB and have generally positive things to say about it (Yanmin added to
the gargantuan cc).

Did slabtop get fixed to work with SLQB?

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  6:46                 ` Nick Piggin
  2009-01-16  6:55                   ` Matthew Wilcox
@ 2009-01-16  7:00                   ` Andrew Morton
  2009-01-16  7:25                     ` Nick Piggin
  2009-01-16  8:59                     ` Nick Piggin
  2009-01-16 18:11                   ` Rick Jones
  2 siblings, 2 replies; 122+ messages in thread
From: Andrew Morton @ 2009-01-16  7:00 UTC (permalink / raw)
  To: Nick Piggin
  Cc: netdev, sfr, matthew, matthew.r.wilcox, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Fri, 16 Jan 2009 17:46:23 +1100 Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> On Friday 16 January 2009 15:12:10 Andrew Morton wrote:
> > On Fri, 16 Jan 2009 15:03:12 +1100 Nick Piggin <nickpiggin@yahoo.com.au> 
> wrote:
> > > I would like to see SLQB merged in mainline, made default, and wait for
> > > some number releases. Then we take what we know, and try to make an
> > > informed decision about the best one to take. I guess that is problematic
> > > in that the rest of the kernel is moving underneath us. Do you have
> > > another idea?
> >
> > Nope.  If it doesn't work out, we can remove it again I guess.
> 
> OK, I have these numbers to show I'm not completely off my rocker to suggest
> we merge SLQB :) Given these results, how about I ask to merge SLQB as default
> in linux-next, then if nothing catastrophic happens, merge it upstream in the
> next merge window, then a couple of releases after that, given some time to
> test and tweak SLQB, then we plan to bite the bullet and emerge with just one
> main slab allocator (plus SLOB).

That's a plan.

> SLQB tends to be the winner here.

Can you think of anything with which it will be the loser?


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  6:55                   ` Matthew Wilcox
@ 2009-01-16  7:06                     ` Nick Piggin
  2009-01-16  7:53                     ` Zhang, Yanmin
  1 sibling, 0 replies; 122+ messages in thread
From: Nick Piggin @ 2009-01-16  7:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Zhang, Yanmin

On Friday 16 January 2009 17:55:47 Matthew Wilcox wrote:
> On Fri, Jan 16, 2009 at 05:46:23PM +1100, Nick Piggin wrote:
> > Intel's OLTP shows SLQB is "neutral" to SLAB. That is, literally within
> > their measurement confidence interval. If it comes down to it, I think we
> > could get them to do more runs to narrow that down, but we're talking a
> > couple of tenths of a percent already.
>
> I think I can speak with some measure of confidence for at least the
> OLTP-testing part of my company when I say that I have no objection to
> Nick's planned merge scheme.
>
> I believe the kernel benchmark group have also done some testing with
> SLQB and have generally positive things to say about it (Yanmin added to
> the gargantuan cc).
>
> Did slabtop get fixed to work with SLQB?

Yes the old slabtop that works on /proc/slabinfo works with SLQB (ie. SLQB
implements /proc/slabinfo).

Lin Ming recently also ported the SLUB /sys/kernel/slab/ specific slabinfo
tool to SLQB. Basically it reports in-depth internal event counts etc. and
can operate on individual caches, making it very useful for performance
"observability" and tuning.

It is hard to come up with a single set of statistics that apply usefully
to all the allocators. FWIW, it would be a useful tool to port over to
SLAB too, if we end up deciding to go with SLAB.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  7:00                   ` Mainline kernel OLTP performance update Andrew Morton
@ 2009-01-16  7:25                     ` Nick Piggin
  2009-01-16  8:59                     ` Nick Piggin
  1 sibling, 0 replies; 122+ messages in thread
From: Nick Piggin @ 2009-01-16  7:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: netdev, sfr, matthew, matthew.r.wilcox, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Friday 16 January 2009 18:00:43 Andrew Morton wrote:
> On Fri, 16 Jan 2009 17:46:23 +1100 Nick Piggin <nickpiggin@yahoo.com.au> 
wrote:
> > On Friday 16 January 2009 15:12:10 Andrew Morton wrote:
> > > On Fri, 16 Jan 2009 15:03:12 +1100 Nick Piggin
> > > <nickpiggin@yahoo.com.au>
> >
> > wrote:
> > > > I would like to see SLQB merged in mainline, made default, and wait
> > > > for some number releases. Then we take what we know, and try to make
> > > > an informed decision about the best one to take. I guess that is
> > > > problematic in that the rest of the kernel is moving underneath us.
> > > > Do you have another idea?
> > >
> > > Nope.  If it doesn't work out, we can remove it again I guess.
> >
> > OK, I have these numbers to show I'm not completely off my rocker to
> > suggest we merge SLQB :) Given these results, how about I ask to merge
> > SLQB as default in linux-next, then if nothing catastrophic happens,
> > merge it upstream in the next merge window, then a couple of releases
> > after that, given some time to test and tweak SLQB, then we plan to bite
> > the bullet and emerge with just one main slab allocator (plus SLOB).
>
> That's a plan.
>
> > SLQB tends to be the winner here.
>
> Can you think of anything with which it will be the loser?

Well, that fio test showed it was behind SLAB. I just discovered that
yesterday during running these tests, so I'll take a look at that. The
Intel performance guys I think have one or two cases where it is slower.
They don't seem to be too serious, and tend to be specific to some
machines (eg. the same test with a different CPU architecture turns out
to be faster). So I'll be looking into these things, but I haven't seen
anything too serious yet. I'm mostly interested in macro benchmarks and
more real world workloads.

At a higher level, SLAB has some interesting features. It basically has
"crossbars" of queues, that basically provide queues for allocating and
freeing to and from different CPUs and nodes. This is what bloats up
the kmem_cache data structures to tens or hundreds of gigabytes each
on SGI size systems. But it is also has good properties. On smaller
multiprocessor and NUMA systems, it might be the case that SLAB does
better in workloads that involve objects being allocated on one CPU and
freed on another. I haven't actually observed problems here, but I don't
have a lot of good tests.

SLAB is also fundamentally different from SLUB and SLQB in that it uses
arrays to store pointers to objects in its queues, rather than having
a linked list using pointers embedded in the objects. This might in some
cases make it easier to prefetch objects in parallel with finding the
object itself. I haven't actually been able to attribute a particular
regression to this interesting difference, but it might turn up as an
issue.

These are two big differences between SLAB and SLQB.

The linked lists of objects were used in favour of arrays again because of
the memory overhead, and to have a better ability to tune the size of the
queues, and reduced overhead in copying around arrays of pointers (SLQB can
just copy the head of one the list to the tail of another in order to move
objects around), and eliminated the need to have additional metadata beyond
the struct page for each slab.

The crossbars of queues were removed because of the bloating and memory
overhead issues. The fact that we now have linked lists helps a little bit
with this, because moving lists of objects around gets a bit easier.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  6:55                   ` Matthew Wilcox
  2009-01-16  7:06                     ` Nick Piggin
@ 2009-01-16  7:53                     ` Zhang, Yanmin
  2009-01-16 10:20                       ` Andi Kleen
  1 sibling, 1 reply; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-16  7:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Nick Piggin, Andrew Morton, netdev, sfr, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

On Thu, 2009-01-15 at 23:55 -0700, Matthew Wilcox wrote:
> On Fri, Jan 16, 2009 at 05:46:23PM +1100, Nick Piggin wrote:
> > Intel's OLTP shows SLQB is "neutral" to SLAB. That is, literally within
> > their measurement confidence interval. If it comes down to it, I think we
> > could get them to do more runs to narrow that down, but we're talking a
> > couple of tenths of a percent already.
> 
> I think I can speak with some measure of confidence for at least the
> OLTP-testing part of my company when I say that I have no objection to
> Nick's planned merge scheme.
> 
> I believe the kernel benchmark group have also done some testing with
> SLQB and have generally positive things to say about it (Yanmin added to
> the gargantuan cc).
We did run lots of benchmarks with SLQB. Comparing with SLUB, one highlighting of
SLQB is with netperf UDP-U-4k. On my x86-64 machines, if I start 1 client and 1 server
process and bind them to different physical cpus, the result of SLQB is about 20% better
than SLUB's. If I start CPU_NUM clients and the same number of servers without binding,
the results of SLQB is about 100% better than SLUB's. I think that's because SLQB
doesn't pass through big object allocation to page allocator.
netperf UDP-U-1k has less improvement with SLQB.

The results of other benchmarks have variations. They are good on some machines,
but bad on other machines. However, the variation is small. For example, hackbench's result
with SLQB is about 1 second than with SLUB on 8-core stoakley. After we worked with
Nick to do small code changing, SLQB's result is a little better than SLUB's
with hackbench on stoakley.

We consider other variations as fluctuation.

All the testing use default SLUB and SLQB configuration.

> 
> Did slabtop get fixed to work with SLQB?
> 


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  7:00                   ` Mainline kernel OLTP performance update Andrew Morton
  2009-01-16  7:25                     ` Nick Piggin
@ 2009-01-16  8:59                     ` Nick Piggin
  1 sibling, 0 replies; 122+ messages in thread
From: Nick Piggin @ 2009-01-16  8:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: netdev, sfr, matthew, matthew.r.wilcox, chinang.ma, linux-kernel,
	sharad.c.tripathi, arjan, andi.kleen, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Friday 16 January 2009 18:00:43 Andrew Morton wrote:
> On Fri, 16 Jan 2009 17:46:23 +1100 Nick Piggin <nickpiggin@yahoo.com.au> 
> > SLQB tends to be the winner here.
>
> Can you think of anything with which it will be the loser?

Here are some more performance numbers with "slub_test" kernel module.
It's basically a really tiny microbenchmark, so I don't really consider
it gives too useful results, except it does show up some problems in
SLAB's scalability  that may start to bite as we continue to get more
threads per socket.

(I ran a few of these tests on one of Dave's 2 socket, 128 thread
systems, and slab gets really painful... these kinds of thread counts
may only be a couple of years away from x86).

All numbers are in CPU cycles.

Single thread testing
=====================
1. Kmalloc: Repeatedly allocate 10000 objs then free them
obj size  SLAB       SLQB      SLUB
8           77+ 128   69+ 47   61+ 77
16          69+ 104  116+ 70   77+ 80
32          66+ 101   82+ 81   71+ 89
64          82+ 116   95+ 81   94+105
128        100+ 148  106+ 94  114+163
256        153+ 136  134+ 98  124+186
512        209+ 161  170+186  134+276
1024       331+ 249  236+245  134+283
2048       608+ 443  380+386  172+312
4096      1109+ 624  678+661  239+372
8192      1166+1077  767+683  535+433
16384     1213+1160  914+731  577+682

We can see SLAB has a fair bit more overhead in this case. SLUB starts
doing higher order allocations I think around size 256, which reduces
costs there. Don't know what the SLQB artifact at 16 is caused by...


2. Kmalloc: alloc/free test (repeatedly allocate and free)
       SLAB  SLQB  SLUB
8       98   90     94
16      98   90     93
32      98   90     93
64      99   90     94
128    100   92     93
256    104   93     95
512    105   94     97
1024   106   93     97
2048   107   95     95
4096   111   92     97
8192   111   94    631
16384  114   92    741

Here we see SLUB's allocator passthrough (or is the the lack of queueing?).
Straight line speed at small sizes is probably due to instructions in the
fastpaths. It's pretty meaningless though because it probably changes if
there is any actual load on the CPU, or another CPU architecture. Doesn't
look bad for SLQB though :)


Concurrent allocs
=================
1. Like the first single thread test, lots of allocs, then lots of frees.
But running on all CPUs. Average over all CPUs.
       SLAB        SLQB         SLUB
8        251+ 322    73+  47   65+  76
16       240+ 331    84+  53   67+  82
32       235+ 316    94+  57   77+  92
64       338+ 303   120+  66  105+ 136
128      549+ 355   139+ 166  127+ 344
256     1129+ 456   189+ 178  236+ 404
512     2085+ 872   240+ 217  244+ 419
1024    3895+1373   347+ 333  251+ 440
2048    7725+2579   616+ 695  373+ 588
4096   15320+4534  1245+1442  689+1002

A problem with SLAB scalability starts showing up on this system with only
4 threads per socket. Again, SLUB sees a benefit from higher order
allocations.


2. Same as 2nd single threaded test, alloc then free, on all CPUs.
      SLAB  SLQB  SLUB
8      99   90    93
16     99   90    93
32     99   90    93
64    100   91    94
128   102   90    93
256   105   94    97
512   106   93    97
1024  108   93    97
2048  109   93    96
4096  110   93    96

No surprises. Objects always fit in queues (or unqueues, in the case of
SLUB), so there is no cross cache traffic.


Remote free test
================
1. Allocate N objects on CPUs 1-7, then free them all from CPU 0. Average cost
   of all kmalloc+kfree
      SLAB        SLQB     SLUB
8       191+ 142   53+ 64  56+99
16      180+ 141   82+ 69  60+117
32      173+ 142  100+ 71  78+151
64      240+ 147  131+ 73  117+216
128     441+ 162  158+114  114+251
256     833+ 181  179+119  185+263
512    1546+ 243  220+132  194+292
1024   2886+ 341  299+135  201+312
2048   5737+ 577  517+139  291+370
4096  11288+1201  976+153  528+482


2. All CPUs allocate on objects on CPU N, then freed by CPU N+1 % NR_CPUS
   (ie. CPU1 frees objects allocated by CPU0).
      SLAB        SLQB     SLUB
8       236+ 331   72+123   64+ 114
16      232+ 345   80+125   71+ 139
32      227+ 342   85+134   82+ 183
64      324+ 336  140+138  111+ 219
128     569+ 384  245+201  145+ 337
256    1111+ 448  243+222  238+ 447
512    2091+ 871  249+244  247+ 470
1024   3923+1593  254+256  254+ 503
2048   7700+2968  273+277  369+ 699
4096  15154+5061  310+323  693+1220

SLAB's concurrent allocation bottlnecks show up again in these tests.

Unfortunately these are not too realistic tests of remote freeing pattern,
because normally you would expect remote freeing and allocation happening
concurrently, rather than all allocations up front, then all frees. If
the test behaved like that, then object could probably fit in SLAB's
queues and it might see some good numbers.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15 13:52             ` Matthew Wilcox
  2009-01-15 14:42               ` Pekka Enberg
@ 2009-01-16 10:16               ` Pekka Enberg
  2009-01-16 10:21                 ` Nick Piggin
  1 sibling, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-16 10:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Nick Piggin, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Thu, Jan 15, 2009 at 11:46:09AM +0200, Pekka Enberg wrote:
>> It would also be nice if someone could do the performance analysis on
>> the SLUB bug. I ran sysbench in oltp mode here and the results look
>> like this:
>>
>>   [ number of transactions per second from 10 runs. ]
>>
>>                    min      max      avg      sd
>>   2.6.29-rc1-slab  833.77   852.32   845.10   4.72
>>   2.6.29-rc1-slub  823.61   851.94   836.74   8.57
>>
>> And no, the numbers are not flipped, SLUB beats SLAB here. :(

On Thu, Jan 15, 2009 at 3:52 PM, Matthew Wilcox <matthew@wil.cx> wrote:
> Um.  More transactions per second is good.  Your numbers show SLAB
> beating SLUB (even on your dual-CPU system).  And SLAB shows a lower
> standard deviation, which is also good.

I had lockdep enabled in my config so I ran the tests again with
x86-64 defconfig and I'm back to square one:

  [ number of transactions per second from 10 runs, bigger is better ]

                   min      max      avg      sd
  2.6.29-rc1-slab  802.02   805.37   803.93   0.97
  2.6.29-rc1-slub  807.78   811.20   809.86   1.05

                        Pekka

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  7:53                     ` Zhang, Yanmin
@ 2009-01-16 10:20                       ` Andi Kleen
  2009-01-20  5:16                         ` Zhang, Yanmin
  0 siblings, 1 reply; 122+ messages in thread
From: Andi Kleen @ 2009-01-16 10:20 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Matthew Wilcox, Nick Piggin, Andrew Morton, netdev, sfr,
	matthew.r.wilcox, chinang.ma, linux-kernel, sharad.c.tripathi,
	arjan, suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

"Zhang, Yanmin" <yanmin_zhang@linux.intel.com> writes:


> I think that's because SLQB
> doesn't pass through big object allocation to page allocator.
> netperf UDP-U-1k has less improvement with SLQB.

That sounds like just the page allocator needs to be improved.
That would help everyone. We talked a bit about this earlier,
some of the heuristics for hot/cold pages are quite outdated
and have been tuned for obsolete machines and also its fast path
is quite long. Unfortunately no code currently.

-Andi


-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 10:16               ` Pekka Enberg
@ 2009-01-16 10:21                 ` Nick Piggin
  2009-01-16 10:31                   ` Pekka Enberg
  0 siblings, 1 reply; 122+ messages in thread
From: Nick Piggin @ 2009-01-16 10:21 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Friday 16 January 2009 21:16:31 Pekka Enberg wrote:
> On Thu, Jan 15, 2009 at 11:46:09AM +0200, Pekka Enberg wrote:
> >> It would also be nice if someone could do the performance analysis on
> >> the SLUB bug. I ran sysbench in oltp mode here and the results look
> >> like this:
> >>
> >>   [ number of transactions per second from 10 runs. ]
> >>
> >>                    min      max      avg      sd
> >>   2.6.29-rc1-slab  833.77   852.32   845.10   4.72
> >>   2.6.29-rc1-slub  823.61   851.94   836.74   8.57

> I had lockdep enabled in my config so I ran the tests again with
> x86-64 defconfig and I'm back to square one:
>
>   [ number of transactions per second from 10 runs, bigger is better ]
>
>                    min      max      avg      sd
>   2.6.29-rc1-slab  802.02   805.37   803.93   0.97
>   2.6.29-rc1-slub  807.78   811.20   809.86   1.05

Hm, I wonder why it is going slower with lockdep disabled?
Did something else change?


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 10:21                 ` Nick Piggin
@ 2009-01-16 10:31                   ` Pekka Enberg
  2009-01-16 10:42                     ` Nick Piggin
  2009-01-16 20:59                     ` Christoph Lameter
  0 siblings, 2 replies; 122+ messages in thread
From: Pekka Enberg @ 2009-01-16 10:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Friday 16 January 2009 21:16:31 Pekka Enberg wrote:
>> I had lockdep enabled in my config so I ran the tests again with
>> x86-64 defconfig and I'm back to square one:
>>
>>   [ number of transactions per second from 10 runs, bigger is better ]
>>
>>                    min      max      avg      sd
>>   2.6.29-rc1-slab  802.02   805.37   803.93   0.97
>>   2.6.29-rc1-slub  807.78   811.20   809.86   1.05

On Fri, Jan 16, 2009 at 12:21 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Hm, I wonder why it is going slower with lockdep disabled?
> Did something else change?

I don't have the exact config for the previous tests but it's was just
my laptop regular config whereas the new tests are x86-64 defconfig.
So I think I'm just hitting some of the other OLTP regressions here,
aren't I? There's some scheduler related options such as
CONFIG_GROUP_SCHED and CONFIG_FAIR_GROUP_SCHED enabled in defconfig
that I didn't have in the original tests. I can try without them if
you want but I'm not sure it's relevant for SLAB vs SLUB tests.

                                Pekka

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 10:31                   ` Pekka Enberg
@ 2009-01-16 10:42                     ` Nick Piggin
  2009-01-16 10:55                       ` Pekka Enberg
  2009-01-16 20:59                     ` Christoph Lameter
  1 sibling, 1 reply; 122+ messages in thread
From: Nick Piggin @ 2009-01-16 10:42 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Friday 16 January 2009 21:31:03 Pekka Enberg wrote:
> On Friday 16 January 2009 21:16:31 Pekka Enberg wrote:
> >> I had lockdep enabled in my config so I ran the tests again with
> >> x86-64 defconfig and I'm back to square one:
> >>
> >>   [ number of transactions per second from 10 runs, bigger is better ]
> >>
> >>                    min      max      avg      sd
> >>   2.6.29-rc1-slab  802.02   805.37   803.93   0.97
> >>   2.6.29-rc1-slub  807.78   811.20   809.86   1.05
>
> On Fri, Jan 16, 2009 at 12:21 PM, Nick Piggin <nickpiggin@yahoo.com.au> 
wrote:
> > Hm, I wonder why it is going slower with lockdep disabled?
> > Did something else change?
>
> I don't have the exact config for the previous tests but it's was just
> my laptop regular config whereas the new tests are x86-64 defconfig.
> So I think I'm just hitting some of the other OLTP regressions here,
> aren't I? There's some scheduler related options such as
> CONFIG_GROUP_SCHED and CONFIG_FAIR_GROUP_SCHED enabled in defconfig
> that I didn't have in the original tests. I can try without them if
> you want but I'm not sure it's relevant for SLAB vs SLUB tests.

Oh no that's fine. It just looked like you repeated the test but
with lockdep disabled (and no other changes).



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 10:42                     ` Nick Piggin
@ 2009-01-16 10:55                       ` Pekka Enberg
  2009-01-19  7:13                         ` Nick Piggin
  0 siblings, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-16 10:55 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

Hi Nick,

On Fri, Jan 16, 2009 at 12:42 PM, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>> I don't have the exact config for the previous tests but it's was just
>> my laptop regular config whereas the new tests are x86-64 defconfig.
>> So I think I'm just hitting some of the other OLTP regressions here,
>> aren't I? There's some scheduler related options such as
>> CONFIG_GROUP_SCHED and CONFIG_FAIR_GROUP_SCHED enabled in defconfig
>> that I didn't have in the original tests. I can try without them if
>> you want but I'm not sure it's relevant for SLAB vs SLUB tests.
>
> Oh no that's fine. It just looked like you repeated the test but
> with lockdep disabled (and no other changes).

Right. In any case, I am still unable to reproduce the OLTP issue and
I've seen SLUB beat SLAB on my machine in most of the benchmarks
you've posted. So I have very mixed feelings about SLQB. It's very
nice that it works for OLTP but we still don't have much insight (i.e.
numbers) on why it's better. I'm also bit worried if SLQB has gotten
enough attention from the NUMA and HPC folks that brought us SLUB.

The good news is that SLQB can replace SLAB so either way, we're not
going to end up with four allocators. Whether it can replace SLUB
remains to be seen.

                        Pekka

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16  6:46                 ` Nick Piggin
  2009-01-16  6:55                   ` Matthew Wilcox
  2009-01-16  7:00                   ` Mainline kernel OLTP performance update Andrew Morton
@ 2009-01-16 18:11                   ` Rick Jones
  2009-01-19  7:43                     ` Nick Piggin
  2 siblings, 1 reply; 122+ messages in thread
From: Rick Jones @ 2009-01-16 18:11 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, netdev, sfr, matthew, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

Nick Piggin wrote:
> OK, I have these numbers to show I'm not completely off my rocker to suggest
> we merge SLQB :) Given these results, how about I ask to merge SLQB as default
> in linux-next, then if nothing catastrophic happens, merge it upstream in the
> next merge window, then a couple of releases after that, given some time to
> test and tweak SLQB, then we plan to bite the bullet and emerge with just one
> main slab allocator (plus SLOB).
> 
> 
> System is a 2socket, 4 core AMD. 

Not exactly a large system :)  Barely NUMA even with just two sockets.

> All debug and stats options turned off for
> all the allocators; default parameters (ie. SLUB using higher order pages,
> and the others tend to be using order-0). SLQB is the version I recently
> posted, with some of the prefetching removed according to Pekka's review
> (probably a good idea to only add things like that in if/when they prove to
> be an improvement).
> 
> ...
 >
> Netperf UDP unidirectional send test (10 runs, higher better):
> 
> Server and client bound to same CPU
> SLAB AVG=60.111 STD=1.59382
> SLQB AVG=60.167 STD=0.685347
> SLUB AVG=58.277 STD=0.788328
> 
> Server and client bound to same socket, different CPUs
> SLAB AVG=85.938 STD=0.875794
> SLQB AVG=93.662 STD=2.07434
> SLUB AVG=81.983 STD=0.864362
> 
> Server and client bound to different sockets
> SLAB AVG=78.801 STD=1.44118
> SLQB AVG=78.269 STD=1.10457
> SLUB AVG=71.334 STD=1.16809
 > ...
> I haven't done any non-local network tests. Networking is the one of the
> subsystems most heavily dependent on slab performance, so if anybody
> cares to run their favourite tests, that would be really helpful.

I'm guessing, but then are these Mbit/s figures? Would that be the sending 
throughput or the receiving throughput?

I love to see netperf used, but why UDP and loopback?  Also, how about the 
service demands?

rick jones

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15 19:44                     ` Ma, Chinang
@ 2009-01-16 18:14                       ` Gregory Haskins
  2009-01-16 19:09                         ` Steven Rostedt
  2009-01-20 12:45                         ` Gregory Haskins
  0 siblings, 2 replies; 122+ messages in thread
From: Gregory Haskins @ 2009-01-16 18:14 UTC (permalink / raw)
  To: Ma, Chinang
  Cc: Wilcox, Matthew R, Steven Rostedt, Matthew Wilcox, Andrew Morton,
	James Bottomley, linux-kernel, Tripathi, Sharad C, arjan, Kleen,
	Andi, Siddha, Suresh B, Chilukuri, Harita, Styner, Douglas W,
	Wang, Peter Xihong, Nueckel, Hubert, chris.mason, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty


[-- Attachment #1.1: Type: text/plain, Size: 1964 bytes --]

Ma, Chinang wrote:
> Gregory. 
> I will test the resched-ipi instrumentation patch with our OLTP if you can post the patch and some instructions.
> Thanks,
> -Chinang
>   

Hi Chinang,
  Please find a patch attached which applies to linus.git as of today. 
You will also want to enable CONFIG_FUNCTION_TRACER as well as the trace
components.  Here is my system:

ghaskins@dev:~/sandbox/git/linux-2.6-rt> grep TRACE .config
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
CONFIG_X86_PTRACE_BTS=y
# CONFIG_ACPI_DEBUG_FUNC_TRACE is not set
CONFIG_NETFILTER_XT_TARGET_TRACE=m
CONFIG_SOUND_TRACEINIT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_STACKTRACE=y
# CONFIG_BACKTRACE_SELF_TEST is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_HW_BRANCH_TRACER=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_IRQSOFF_TRACER=y
CONFIG_SYSPROF_TRACER=y
CONFIG_SCHED_TRACER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
# CONFIG_BOOT_TRACER is not set
# CONFIG_TRACE_BRANCH_PROFILING is not set
CONFIG_POWER_TRACER=y
CONFIG_STACK_TRACER=y
CONFIG_HW_BRANCH_TRACER=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_KVM_TRACE is not set


Then on your booted system, do:

echo sched_switch > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_enabled
$run_oltp && echo 0 > /sys/kernel/debug/tracing/tracing_enabled

(where $run_oltp is your suite)

Then, email the contents of /sys/kernel/debug/tracing/trace to me

-Greg


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: instrumentation.patch --]
[-- Type: text/x-patch; name="instrumentation.patch", Size: 3263 bytes --]

ftrace instrumentation for RT tasks

From: Gregory Haskins <ghaskins@novell.com>

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
---

 arch/x86/kernel/smp.c |    2 ++
 include/linux/sched.h |    6 ++++++
 kernel/sched.c        |    3 +++
 kernel/sched_rt.c     |   10 ++++++++++
 4 files changed, 21 insertions(+), 0 deletions(-)


diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index e6faa33..468abeb 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -118,6 +118,7 @@ static void native_smp_send_reschedule(int cpu)
 		WARN_ON(1);
 		return;
 	}
+	ftrace_printk("cpu %d\n", cpu);
 	send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
 }
 
@@ -171,6 +172,7 @@ static void native_smp_send_stop(void)
  */
 void smp_reschedule_interrupt(struct pt_regs *regs)
 {
+	ftrace_printk("NEEDS_RESCHED\n");
 	ack_APIC_irq();
 	inc_irq_stat(irq_resched_count);
 }
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4cae9b8..a320692 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2094,8 +2094,14 @@ static inline int test_tsk_thread_flag(struct task_struct *tsk, int flag)
 	return test_ti_thread_flag(task_thread_info(tsk), flag);
 }
 
+# define ftrace_printk(fmt...) __ftrace_printk(_THIS_IP_, fmt)
+extern int
+__ftrace_printk(unsigned long ip, const char *fmt, ...)
+	__attribute__ ((format (printf, 2, 3)));
+
 static inline void set_tsk_need_resched(struct task_struct *tsk)
 {
+	ftrace_printk("%s/%d\n", tsk->comm, tsk->pid);
 	set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);
 }
 
diff --git a/kernel/sched.c b/kernel/sched.c
index 52bbf1c..d55fcf1 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1874,6 +1874,9 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 		      *new_cfsrq = cpu_cfs_rq(old_cfsrq, new_cpu);
 	u64 clock_offset;
 
+	ftrace_printk("migrate %s/%d [%d] -> [%d]\n",
+		      p->comm, p->pid, task_cpu(p), new_cpu);
+
 	clock_offset = old_rq->clock - new_rq->clock;
 
 	trace_sched_migrate_task(p, task_cpu(p), new_cpu);
diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 954e1a8..59cf64b 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -1102,6 +1102,8 @@ static int push_rt_task(struct rq *rq)
 	if (!next_task)
 		return 0;
 
+	ftrace_printk("attempting push\n");
+
  retry:
 	if (unlikely(next_task == rq->curr)) {
 		WARN_ON(1);
@@ -1139,6 +1141,8 @@ static int push_rt_task(struct rq *rq)
 		goto out;
 	}
 
+	ftrace_printk("%s/%d\n", next_task->comm, next_task->pid);
+
 	deactivate_task(rq, next_task, 0);
 	set_task_cpu(next_task, lowest_rq->cpu);
 	activate_task(lowest_rq, next_task, 0);
@@ -1180,6 +1184,8 @@ static int pull_rt_task(struct rq *this_rq)
 	if (likely(!rt_overloaded(this_rq)))
 		return 0;
 
+	ftrace_printk("attempting pull\n");
+
 	next = pick_next_task_rt(this_rq);
 
 	for_each_cpu(cpu, this_rq->rd->rto_mask) {
@@ -1234,6 +1240,10 @@ static int pull_rt_task(struct rq *this_rq)
 
 			ret = 1;
 
+			ftrace_printk("pull %s/%d [%d] -> [%d]\n",
+				      p->comm, p->pid,
+				      src_rq->cpu, this_rq->cpu);
+
 			deactivate_task(src_rq, p, 0);
 			set_task_cpu(p, this_cpu);
 			activate_task(this_rq, p, 0);

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 18:14                       ` Gregory Haskins
@ 2009-01-16 19:09                         ` Steven Rostedt
  2009-01-20 12:45                         ` Gregory Haskins
  1 sibling, 0 replies; 122+ messages in thread
From: Steven Rostedt @ 2009-01-16 19:09 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Ma, Chinang, Wilcox, Matthew R, Matthew Wilcox, Andrew Morton,
	James Bottomley, linux-kernel, Tripathi, Sharad C, arjan, Kleen,
	Andi, Siddha, Suresh B, Chilukuri, Harita, Styner, Douglas W,
	Wang, Peter Xihong, Nueckel, Hubert, chris.mason, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty


On Fri, 2009-01-16 at 13:14 -0500, Gregory Haskins wrote:
> Ma, Chinang wrote:
> > Gregory. 
> > I will test the resched-ipi instrumentation patch with our OLTP if you can post the patch and some instructions.
> > Thanks,
> > -Chinang
> >   
> 
> Hi Chinang,
>   Please find a patch attached which applies to linus.git as of today. 
> You will also want to enable CONFIG_FUNCTION_TRACER as well as the trace
> components.  Here is my system:
> 

I don't see why CONFIG_FUNCTION_TRACER is needed.

> ghaskins@dev:~/sandbox/git/linux-2.6-rt> grep TRACE .config
> CONFIG_STACKTRACE_SUPPORT=y
> CONFIG_TRACEPOINTS=y
> CONFIG_HAVE_ARCH_TRACEHOOK=y
> CONFIG_BLK_DEV_IO_TRACE=y
> # CONFIG_TREE_RCU_TRACE is not set
> # CONFIG_PREEMPT_RCU_TRACE is not set
> CONFIG_X86_PTRACE_BTS=y
> # CONFIG_ACPI_DEBUG_FUNC_TRACE is not set
> CONFIG_NETFILTER_XT_TARGET_TRACE=m
> CONFIG_SOUND_TRACEINIT=y
> CONFIG_TRACE_IRQFLAGS_SUPPORT=y
> CONFIG_TRACE_IRQFLAGS=y
> CONFIG_STACKTRACE=y
> # CONFIG_BACKTRACE_SELF_TEST is not set
> CONFIG_USER_STACKTRACE_SUPPORT=y
> CONFIG_NOP_TRACER=y
> CONFIG_HAVE_FUNCTION_TRACER=y
> CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_HAVE_HW_BRANCH_TRACER=y
> CONFIG_TRACER_MAX_TRACE=y
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_IRQSOFF_TRACER=y
> CONFIG_SYSPROF_TRACER=y
> CONFIG_SCHED_TRACER=y

This CONFIG_SCHED_TRACER should be enough.

-- Steve

> CONFIG_CONTEXT_SWITCH_TRACER=y
> # CONFIG_BOOT_TRACER is not set
> # CONFIG_TRACE_BRANCH_PROFILING is not set
> CONFIG_POWER_TRACER=y
> CONFIG_STACK_TRACER=y
> CONFIG_HW_BRANCH_TRACER=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
> # CONFIG_FTRACE_STARTUP_TEST is not set
> # CONFIG_MMIOTRACE is not set
> # CONFIG_KVM_TRACE is not set
> 
> 
> Then on your booted system, do:
> 
> echo sched_switch > /sys/kernel/debug/tracing/current_tracer
> echo 1 > /sys/kernel/debug/tracing/tracing_enabled
> $run_oltp && echo 0 > /sys/kernel/debug/tracing/tracing_enabled
> 
> (where $run_oltp is your suite)
> 
> Then, email the contents of /sys/kernel/debug/tracing/trace to me
> 
> -Greg
> 


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 10:31                   ` Pekka Enberg
  2009-01-16 10:42                     ` Nick Piggin
@ 2009-01-16 20:59                     ` Christoph Lameter
  1 sibling, 0 replies; 122+ messages in thread
From: Christoph Lameter @ 2009-01-16 20:59 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nick Piggin, Matthew Wilcox, Andrew Morton, Wilcox, Matthew R,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty

On Fri, 16 Jan 2009, Pekka Enberg wrote:

> aren't I? There's some scheduler related options such as
> CONFIG_GROUP_SCHED and CONFIG_FAIR_GROUP_SCHED enabled in defconfig
> that I didn't have in the original tests. I can try without them if
> you want but I'm not sure it's relevant for SLAB vs SLUB tests.

I have seen CONFIG_GROUP_SCHED to affect latency tests in significant
ways.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 10:55                       ` Pekka Enberg
@ 2009-01-19  7:13                         ` Nick Piggin
  2009-01-19  8:05                           ` Pekka Enberg
  0 siblings, 1 reply; 122+ messages in thread
From: Nick Piggin @ 2009-01-19  7:13 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Friday 16 January 2009 21:55:30 Pekka Enberg wrote:
> Hi Nick,
>
> On Fri, Jan 16, 2009 at 12:42 PM, Nick Piggin <nickpiggin@yahoo.com.au> 
wrote:
> >> I don't have the exact config for the previous tests but it's was just
> >> my laptop regular config whereas the new tests are x86-64 defconfig.
> >> So I think I'm just hitting some of the other OLTP regressions here,
> >> aren't I? There's some scheduler related options such as
> >> CONFIG_GROUP_SCHED and CONFIG_FAIR_GROUP_SCHED enabled in defconfig
> >> that I didn't have in the original tests. I can try without them if
> >> you want but I'm not sure it's relevant for SLAB vs SLUB tests.
> >
> > Oh no that's fine. It just looked like you repeated the test but
> > with lockdep disabled (and no other changes).
>
> Right. In any case, I am still unable to reproduce the OLTP issue and
> I've seen SLUB beat SLAB on my machine in most of the benchmarks
> you've posted.

SLUB was distinctly slower on the tbench, netperf, and hackbench
tests that I ran. These were faster with SLUB on your machine?
What kind of system is it?


> So I have very mixed feelings about SLQB. It's very
> nice that it works for OLTP but we still don't have much insight (i.e.
> numbers) on why it's better.

According to estimates in this thread, I think Matthew said SLUB would
be around 6% slower? SLQB is within measurement error of SLAB.

Fair point about personally reproducing the OLTP problem yourself. But
the fact is that we will get problem reports that cannot be reproduced.
That does not make them less relevant. I can't reproduce the OLTP
benchmark myself. And I'm fully expecting to get problem reports for
SLQB against insanely sized SGI systems, which I will take very seriously
and try to fix them.


> I'm also bit worried if SLQB has gotten
> enough attention from the NUMA and HPC folks that brought us SLUB.

It hasn't, but that's the problem we're hoping to solve by getting it
merged. People can give it more attention, and we can try to fix any
problems. SLUB has been default for quite a while now and not able to
solve all problems it has had reported against it. So I hope SLQB will
be able to unblock this situation.


> The good news is that SLQB can replace SLAB so either way, we're not
> going to end up with four allocators. Whether it can replace SLUB
> remains to be seen.

Well I think being able to simply replace SLAB is not ideal. The plan
I'm hoping is to have four allocators for a few releases, and then
go back to having two. That is going to mean some groups might not
have their ideal allocator merged... but I think it is crazy to settle
with more than one main compile-time allocator for the long term.

I don't know what the next redhat enterprise release is going to do,
but if they go with SLAB, then I think that means no SGI systems would
run in production with SLUB anyway, so what would be the purpose of
having a special "HPC/huge system" allocator? Or... what other reasons
should users select SLUB vs SLAB? (in terms of core allocator behaviour,
versus extras that can be ported from one to the other) If we can't even
make up our own minds, then will others be able to?


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 18:11                   ` Rick Jones
@ 2009-01-19  7:43                     ` Nick Piggin
  2009-01-19 22:19                       ` Rick Jones
  0 siblings, 1 reply; 122+ messages in thread
From: Nick Piggin @ 2009-01-19  7:43 UTC (permalink / raw)
  To: Rick Jones
  Cc: Andrew Morton, netdev, sfr, matthew, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

On Saturday 17 January 2009 05:11:02 Rick Jones wrote:
> Nick Piggin wrote:
> > OK, I have these numbers to show I'm not completely off my rocker to
> > suggest we merge SLQB :) Given these results, how about I ask to merge
> > SLQB as default in linux-next, then if nothing catastrophic happens,
> > merge it upstream in the next merge window, then a couple of releases
> > after that, given some time to test and tweak SLQB, then we plan to bite
> > the bullet and emerge with just one main slab allocator (plus SLOB).
> >
> >
> > System is a 2socket, 4 core AMD.
>
> Not exactly a large system :)  Barely NUMA even with just two sockets.

You're right ;)

But at least it is exercising the NUMA paths in the allocator, and
represents a pretty common size of system...

I can run some tests on bigger systems at SUSE, but it is not always
easy to set up "real" meaningful workloads on them or configure
significant IO for them.


> > Netperf UDP unidirectional send test (10 runs, higher better):
> >
> > Server and client bound to same CPU
> > SLAB AVG=60.111 STD=1.59382
> > SLQB AVG=60.167 STD=0.685347
> > SLUB AVG=58.277 STD=0.788328
> >
> > Server and client bound to same socket, different CPUs
> > SLAB AVG=85.938 STD=0.875794
> > SLQB AVG=93.662 STD=2.07434
> > SLUB AVG=81.983 STD=0.864362
> >
> > Server and client bound to different sockets
> > SLAB AVG=78.801 STD=1.44118
> > SLQB AVG=78.269 STD=1.10457
> > SLUB AVG=71.334 STD=1.16809
> >
>  > ...
> >
> > I haven't done any non-local network tests. Networking is the one of the
> > subsystems most heavily dependent on slab performance, so if anybody
> > cares to run their favourite tests, that would be really helpful.
>
> I'm guessing, but then are these Mbit/s figures? Would that be the sending
> throughput or the receiving throughput?

Yes, Mbit/s. They were... hmm, sending throughput I think, but each pair
of numbers seemed to be identical IIRC?


> I love to see netperf used, but why UDP and loopback?

No really good reason. I guess I was hoping to keep other variables as
small as possible. But I guess a real remote test would be a lot more
realistic as a networking test. Hmm, but I could probably set up a test
over a simple GbE link here.  I'll try that.


> Also, how about the
> service demands?

Well, over loopback and using CPU binding, I was hoping it wouldn't
change much... but I see netperf does some measurements for you. I
will consider those in future too.

BTW. is it possible to do parallel netperf tests?



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19  7:13                         ` Nick Piggin
@ 2009-01-19  8:05                           ` Pekka Enberg
  2009-01-19  8:33                             ` Nick Piggin
  0 siblings, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-19  8:05 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

Hi Nick,

On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> SLUB was distinctly slower on the tbench, netperf, and hackbench
> tests that I ran. These were faster with SLUB on your machine?

I was trying to bisect a somewhat recent SLAB vs. SLUB regression in
tbench that seems to be triggered by CONFIG_SLUB as suggested by Evgeniy
Polyakov performance tests. Unfortunately I bisected it down to a bogus
commit so while I saw SLUB beating SLAB, I also saw the reverse in
nearby commits which didn't touch anything interesting. So for tbench,
SLUB _used to_ dominate SLAB on my machine but the current situation is
not as clear with all the tbench regressions in other subsystems.

SLUB has been a consistent winner for hackbench after Christoph fixed
the regression reported by Ingo Molnar two years (?) ago. I don't think
I've ran netperf, but for the fio test you mentioned, SLUB is beating
SLAB here.

On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> What kind of system is it?

2-way Core2. I posted my /proc/cpuinfo in this thread if you're
interested.

On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> > So I have very mixed feelings about SLQB. It's very
> > nice that it works for OLTP but we still don't have much insight (i.e.
> > numbers) on why it's better.

On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> According to estimates in this thread, I think Matthew said SLUB would
> be around 6% slower? SLQB is within measurement error of SLAB.

Yeah but I say that we don't know _why_ it's better. There's the
kmalloc()/kfree() CPU ping-pong hypothesis but it could also be due to
page allocator interaction or just a plain bug in SLUB. And lets not
forget bad interaction with some random subsystem (SCSI, for example).

On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> Fair point about personally reproducing the OLTP problem yourself. But
> the fact is that we will get problem reports that cannot be reproduced.
> That does not make them less relevant. I can't reproduce the OLTP
> benchmark myself. And I'm fully expecting to get problem reports for
> SLQB against insanely sized SGI systems, which I will take very seriously
> and try to fix them.

Again, it's not that I don't take the OLTP regression seriously (I do)
but as a "part-time maintainer" I simply don't have the time and
resources to attempt to fix it without either (a) being able to
reproduce the problem or (b) have someone who can reproduce it who is
willing to do oprofile and so on.

So as much as I would have preferred that you had at least attempted to
fix SLUB, I'm more than happy that we have a very active developer
working on the problem now. I mean, I don't really care which allocator
we decide to go forward with, if all the relevant regressions are dealt
with.

All I am saying is that I don't like how we're fixing a performance bug
with a shiny new allocator without a credible explanation why the
current approach is not fixable.

On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> > The good news is that SLQB can replace SLAB so either way, we're not
> > going to end up with four allocators. Whether it can replace SLUB
> > remains to be seen.
> 
> Well I think being able to simply replace SLAB is not ideal. The plan
> I'm hoping is to have four allocators for a few releases, and then
> go back to having two. That is going to mean some groups might not
> have their ideal allocator merged... but I think it is crazy to settle
> with more than one main compile-time allocator for the long term.

So now the HPC folk will be screwed over by the OLTP folk? I guess
that's okay as the latter have been treated rather badly for the past
two years.... ;-)

			Pekka


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19  8:05                           ` Pekka Enberg
@ 2009-01-19  8:33                             ` Nick Piggin
  2009-01-19  8:42                               ` Nick Piggin
  2009-01-19  9:48                               ` Pekka Enberg
  0 siblings, 2 replies; 122+ messages in thread
From: Nick Piggin @ 2009-01-19  8:33 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Monday 19 January 2009 19:05:03 Pekka Enberg wrote:
> Hi Nick,
>
> On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> > SLUB was distinctly slower on the tbench, netperf, and hackbench
> > tests that I ran. These were faster with SLUB on your machine?
>
> I was trying to bisect a somewhat recent SLAB vs. SLUB regression in
> tbench that seems to be triggered by CONFIG_SLUB as suggested by Evgeniy
> Polyakov performance tests. Unfortunately I bisected it down to a bogus
> commit so while I saw SLUB beating SLAB, I also saw the reverse in
> nearby commits which didn't touch anything interesting. So for tbench,
> SLUB _used to_ dominate SLAB on my machine but the current situation is
> not as clear with all the tbench regressions in other subsystems.

OK.


> SLUB has been a consistent winner for hackbench after Christoph fixed
> the regression reported by Ingo Molnar two years (?) ago. I don't think
> I've ran netperf, but for the fio test you mentioned, SLUB is beating
> SLAB here.

Hmm, netperf, hackbench, and fio are all faster with SLAB than SLUB.


> On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> > What kind of system is it?
>
> 2-way Core2. I posted my /proc/cpuinfo in this thread if you're
> interested.

Thanks. I guess one of three obvious differences, mine is a K10, is
NUMA, and has significantly more cores. I can try setting it to
interleave cachelines over nodes or use fewer cores to see if the
picture changes...


> On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> > > So I have very mixed feelings about SLQB. It's very
> > > nice that it works for OLTP but we still don't have much insight (i.e.
> > > numbers) on why it's better.
>
> On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> > According to estimates in this thread, I think Matthew said SLUB would
> > be around 6% slower? SLQB is within measurement error of SLAB.
>
> Yeah but I say that we don't know _why_ it's better. There's the
> kmalloc()/kfree() CPU ping-pong hypothesis but it could also be due to
> page allocator interaction or just a plain bug in SLUB. And lets not
> forget bad interaction with some random subsystem (SCSI, for example).
>
> On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> > Fair point about personally reproducing the OLTP problem yourself. But
> > the fact is that we will get problem reports that cannot be reproduced.
> > That does not make them less relevant. I can't reproduce the OLTP
> > benchmark myself. And I'm fully expecting to get problem reports for
> > SLQB against insanely sized SGI systems, which I will take very seriously
> > and try to fix them.
>
> Again, it's not that I don't take the OLTP regression seriously (I do)
> but as a "part-time maintainer" I simply don't have the time and
> resources to attempt to fix it without either (a) being able to
> reproduce the problem or (b) have someone who can reproduce it who is
> willing to do oprofile and so on.
>
> So as much as I would have preferred that you had at least attempted to
> fix SLUB, I'm more than happy that we have a very active developer
> working on the problem now. I mean, I don't really care which allocator
> we decide to go forward with, if all the relevant regressions are dealt
> with.

OK, good to know.


> All I am saying is that I don't like how we're fixing a performance bug
> with a shiny new allocator without a credible explanation why the
> current approach is not fixable.

To be honest, my biggest concern with SLUB is the higher order pages
thing. But Christoph always poo poos me when I raise that concern, and
it's hard to get concrete numbers showing real fragmentation problems
when it can take days or months to start biting.

It really stems from queueing versus not queueing I guess. And I think
SLUB is flawed due to its avoidance of queueing.


> On Mon, 2009-01-19 at 18:13 +1100, Nick Piggin wrote:
> > > The good news is that SLQB can replace SLAB so either way, we're not
> > > going to end up with four allocators. Whether it can replace SLUB
> > > remains to be seen.
> >
> > Well I think being able to simply replace SLAB is not ideal. The plan
> > I'm hoping is to have four allocators for a few releases, and then
> > go back to having two. That is going to mean some groups might not
> > have their ideal allocator merged... but I think it is crazy to settle
> > with more than one main compile-time allocator for the long term.
>
> So now the HPC folk will be screwed over by the OLTP folk?

No. I'm imagining there will be a discussion of the 3, and at some
point an executive decision will be made if an agreement can't be
reached. At this point, I think that is a better and fairer option
than just asserting one allocator is better than another and making
it the default.

And... we have no indication that SLQB will be worse for HPC than
SLUB ;)


> I guess
> that's okay as the latter have been treated rather badly for the past
> two years.... ;-)

I don't know if that is meant to be sarcastic, but the OLTP performance
numbers almost never get better from one kernel to the next. Actually
the trend is downward. Mainly due to bloat or new features being added.

I think that at some level, controlled addition of features that may
add some cycles to these paths is not a bad idea (what good is Moore's
Law if we can't have shiny new features? :) But on the other hand, this
OLTP test is incredibly valuable to monitor the general performance-
health of this area of the kernel.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19  8:33                             ` Nick Piggin
@ 2009-01-19  8:42                               ` Nick Piggin
  2009-01-19  8:47                                 ` Pekka Enberg
  2009-01-19  9:48                               ` Pekka Enberg
  1 sibling, 1 reply; 122+ messages in thread
From: Nick Piggin @ 2009-01-19  8:42 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Monday 19 January 2009 19:33:27 Nick Piggin wrote:
> On Monday 19 January 2009 19:05:03 Pekka Enberg wrote:

> > All I am saying is that I don't like how we're fixing a performance bug
> > with a shiny new allocator without a credible explanation why the
> > current approach is not fixable.
>
> To be honest, my biggest concern with SLUB is the higher order pages
> thing. But Christoph always poo poos me when I raise that concern, and
> it's hard to get concrete numbers showing real fragmentation problems
> when it can take days or months to start biting.
>
> It really stems from queueing versus not queueing I guess. And I think
> SLUB is flawed due to its avoidance of queueing.

And FWIW, Christoph was also not able to fix the OLTP problem although
I think it has been known for nearly two years ago now (I remember we
talked about it at 2007 KS, although I wasn't following slab development
very keenly back then).

At this point I feel spending time working on SLUB isn't a good idea if
a) Christoph himself hadn't fixed this problem; and b) we disagree about
fundamental design choices (see the "SLQB slab allocator" thread).

Anyway, nobody has disagreed with my proposal to merge SLQB, so in the
worst case I don't think it will cause too much harm, and in the best
case it might turn out to make the best tradeoffs and who knows, it
might actually not be catastrophic for HPC ;)


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19  8:42                               ` Nick Piggin
@ 2009-01-19  8:47                                 ` Pekka Enberg
  2009-01-19  8:57                                   ` Nick Piggin
  0 siblings, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-19  8:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Mon, Jan 19, 2009 at 10:42 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Anyway, nobody has disagreed with my proposal to merge SLQB, so in the
> worst case I don't think it will cause too much harm, and in the best
> case it might turn out to make the best tradeoffs and who knows, it
> might actually not be catastrophic for HPC ;)

Yeah. If Andrew/Linus doesn't want to merge SLQB to 2.6.29, we can
stick it in linux-next through slab.git if you want.

                                Pekka

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19  8:47                                 ` Pekka Enberg
@ 2009-01-19  8:57                                   ` Nick Piggin
  0 siblings, 0 replies; 122+ messages in thread
From: Nick Piggin @ 2009-01-19  8:57 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Monday 19 January 2009 19:47:24 Pekka Enberg wrote:
> On Mon, Jan 19, 2009 at 10:42 AM, Nick Piggin <nickpiggin@yahoo.com.au> 
wrote:
> > Anyway, nobody has disagreed with my proposal to merge SLQB, so in the
> > worst case I don't think it will cause too much harm, and in the best
> > case it might turn out to make the best tradeoffs and who knows, it
> > might actually not be catastrophic for HPC ;)
>
> Yeah. If Andrew/Linus doesn't want to merge SLQB to 2.6.29, we can

I would prefer not. Apart from not practicing what I preach about
merging, if it has stupid bugs on some systems or obvious performance
problems, it will not be a good start ;)

> stick it in linux-next through slab.git if you want.

That would be appreciated. It's not quite ready yet...

Thanks.
Nick


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19  8:33                             ` Nick Piggin
  2009-01-19  8:42                               ` Nick Piggin
@ 2009-01-19  9:48                               ` Pekka Enberg
  2009-01-19 10:03                                 ` Nick Piggin
  1 sibling, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-19  9:48 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

Hi Nick,

On Mon, Jan 19, 2009 at 10:33 AM, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>> All I am saying is that I don't like how we're fixing a performance bug
>> with a shiny new allocator without a credible explanation why the
>> current approach is not fixable.
>
> To be honest, my biggest concern with SLUB is the higher order pages
> thing. But Christoph always poo poos me when I raise that concern, and
> it's hard to get concrete numbers showing real fragmentation problems
> when it can take days or months to start biting.

To be fair to SLUB, we do have the pending slab defragmentation
patches in my tree. Not that we have any numbers on if defragmentation
helps and how much. IIRC, Christoph said one of the reasons for
avoiding queues in SLUB is to be able to do defragmentation. But I
suppose with SLQB we can do the same thing as long as we flush the
queues before attempting to defrag.

                                Pekka

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19  9:48                               ` Pekka Enberg
@ 2009-01-19 10:03                                 ` Nick Piggin
  0 siblings, 0 replies; 122+ messages in thread
From: Nick Piggin @ 2009-01-19 10:03 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Matthew Wilcox, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, Andrew Vasquez, Anirban Chakraborty,
	Christoph Lameter

On Monday 19 January 2009 20:48:52 Pekka Enberg wrote:
> Hi Nick,
>
> On Mon, Jan 19, 2009 at 10:33 AM, Nick Piggin <nickpiggin@yahoo.com.au> 
wrote:
> >> All I am saying is that I don't like how we're fixing a performance bug
> >> with a shiny new allocator without a credible explanation why the
> >> current approach is not fixable.
> >
> > To be honest, my biggest concern with SLUB is the higher order pages
> > thing. But Christoph always poo poos me when I raise that concern, and
> > it's hard to get concrete numbers showing real fragmentation problems
> > when it can take days or months to start biting.
>
> To be fair to SLUB, we do have the pending slab defragmentation
> patches in my tree. Not that we have any numbers on if defragmentation
> helps and how much. IIRC, Christoph said one of the reasons for
> avoiding queues in SLUB is to be able to do defragmentation. But I
> suppose with SLQB we can do the same thing as long as we flush the
> queues before attempting to defrag.

I have had a look at them, (and I raised some concerns about races with
the bufferhead "defragmentation" patch which I didn't get a reply to,
but now's not the time to get into that).

Christoph's design AFAIKS is not impossible with queued slab allocators,
but they would just need to do either some kind of per-cpu processing,
at least a way to flush queues of objects. This should not be impossible.

But in my reply, I also outlined an idea for a possibly better design for
targetted slab reclaim that could have fewer of the locking complexitiesin 
other subsystems like the slub defrag patches do. I plan to look at this
at some point, but I think we need to sort out the basics first.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-15  7:11           ` Ma, Chinang
@ 2009-01-19 18:04             ` Chris Mason
  2009-01-19 18:37               ` Steven Rostedt
  0 siblings, 1 reply; 122+ messages in thread
From: Chris Mason @ 2009-01-19 18:04 UTC (permalink / raw)
  To: Ma, Chinang
  Cc: Steven Rostedt, Andrew Morton, Matthew Wilcox, Wilcox, Matthew R,
	linux-kernel, Tripathi, Sharad C, arjan, Kleen, Andi, Siddha,
	Suresh B, Chilukuri, Harita, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Gregory Haskins

On Thu, 2009-01-15 at 00:11 -0700, Ma, Chinang wrote:
> >> > > > >
> >> > > > > Linux OLTP Performance summary
> >> > > > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%
> >iowait%
> >> > > > > 2.6.24.2                1.000   21969   43425   76   24     0
> >0
> >> > > > > 2.6.27.2                0.973   30402   43523   74   25     0
> >1
> >> > > > > 2.6.29-rc1              0.965   30331   41970   74   26     0
> >0
> >> >
> >> > > But the interrupt rate went through the roof.
> >> >
> >> > Yes.  I forget why that was; I'll have to dig through my archives for
> >> > that.
> >>
> >> Oh.  I'd have thought that this alone could account for 3.5%.

A later email indicated the reschedule interrupt count doubled since
2.6.24, and so I poked around a bit at the causes of resched_task.

I think the -rt version of check_preempt_equal_prio has gotten much more
expensive since 2.6.24.

I'm sure these changes were made for good reasons, and this workload may
not be a good reason to change it back.  But, what does the patch below
do to performance on 2.6.29-rcX?

-chris

diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
index 954e1a8..bbe3492 100644
--- a/kernel/sched_rt.c
+++ b/kernel/sched_rt.c
@@ -842,6 +842,7 @@ static void check_preempt_curr_rt(struct rq *rq,
struct task_struct *p, int sync
 		resched_task(rq->curr);
 		return;
 	}
+	return;
 
 #ifdef CONFIG_SMP
 	/*





^ permalink raw reply related	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-19 18:04             ` Chris Mason
@ 2009-01-19 18:37               ` Steven Rostedt
  2009-01-19 18:55                 ` Chris Mason
  2009-01-19 23:40                 ` Ingo Molnar
  0 siblings, 2 replies; 122+ messages in thread
From: Steven Rostedt @ 2009-01-19 18:37 UTC (permalink / raw)
  To: Chris Mason
  Cc: Ma, Chinang, Andrew Morton, Matthew Wilcox, Wilcox, Matthew R,
	linux-kernel, Tripathi, Sharad C, arjan, Kleen, Andi, Siddha,
	Suresh B, Chilukuri, Harita, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Gregory Haskins, Rusty Russell

(added Rusty)

On Mon, 2009-01-19 at 13:04 -0500, Chris Mason wrote:
> On Thu, 2009-01-15 at 00:11 -0700, Ma, Chinang wrote:
> > >> > > > >
> > >> > > > > Linux OLTP Performance summary
> > >> > > > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%
> > >iowait%
> > >> > > > > 2.6.24.2                1.000   21969   43425   76   24     0
> > >0
> > >> > > > > 2.6.27.2                0.973   30402   43523   74   25     0
> > >1
> > >> > > > > 2.6.29-rc1              0.965   30331   41970   74   26     0
> > >0
> > >> >
> > >> > > But the interrupt rate went through the roof.
> > >> >
> > >> > Yes.  I forget why that was; I'll have to dig through my archives for
> > >> > that.
> > >>
> > >> Oh.  I'd have thought that this alone could account for 3.5%.
> 
> A later email indicated the reschedule interrupt count doubled since
> 2.6.24, and so I poked around a bit at the causes of resched_task.
> 
> I think the -rt version of check_preempt_equal_prio has gotten much more
> expensive since 2.6.24.
> 
> I'm sure these changes were made for good reasons, and this workload may
> not be a good reason to change it back.  But, what does the patch below
> do to performance on 2.6.29-rcX?
> 
> -chris
> 
> diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> index 954e1a8..bbe3492 100644
> --- a/kernel/sched_rt.c
> +++ b/kernel/sched_rt.c
> @@ -842,6 +842,7 @@ static void check_preempt_curr_rt(struct rq *rq,
> struct task_struct *p, int sync
>  		resched_task(rq->curr);
>  		return;
>  	}
> +	return;
>  
>  #ifdef CONFIG_SMP
>  	/*

That should not cause much of a problem if the scheduling task is not
pinned to an CPU. But!!!!!

A recent change makes it expensive:

commit 24600ce89a819a8f2fb4fd69fd777218a82ade20
Author: Rusty Russell <rusty@rustcorp.com.au>
Date:   Tue Nov 25 02:35:13 2008 +1030

    sched: convert check_preempt_equal_prio to cpumask_var_t.
    
    Impact: stack reduction for large NR_CPUS



which has:

 static void check_preempt_equal_prio(struct rq *rq, struct task_struct
*p)
 {
-       cpumask_t mask;
+       cpumask_var_t mask;
 
        if (rq->curr->rt.nr_cpus_allowed == 1)
                return;
 
-       if (p->rt.nr_cpus_allowed != 1
-           && cpupri_find(&rq->rd->cpupri, p, &mask))
+       if (!alloc_cpumask_var(&mask, GFP_ATOMIC))
                return;




check_preempt_equal_prio is in a scheduling hot path!!!!!

WTF are we allocating there for?

-- Steve




^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-19 18:37               ` Steven Rostedt
@ 2009-01-19 18:55                 ` Chris Mason
  2009-01-19 19:07                   ` Steven Rostedt
  2009-01-19 23:40                 ` Ingo Molnar
  1 sibling, 1 reply; 122+ messages in thread
From: Chris Mason @ 2009-01-19 18:55 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Ma, Chinang, Andrew Morton, Matthew Wilcox, Wilcox, Matthew R,
	linux-kernel, Tripathi, Sharad C, arjan, Kleen, Andi, Siddha,
	Suresh B, Chilukuri, Harita, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Gregory Haskins, Rusty Russell

On Mon, 2009-01-19 at 13:37 -0500, Steven Rostedt wrote:
> (added Rusty)
> 
> On Mon, 2009-01-19 at 13:04 -0500, Chris Mason wrote:
> > 
> > I think the -rt version of check_preempt_equal_prio has gotten much more
> > expensive since 2.6.24.
> > 
> > I'm sure these changes were made for good reasons, and this workload may
> > not be a good reason to change it back.  But, what does the patch below
> > do to performance on 2.6.29-rcX?
> > 
> > -chris
> > 
> > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> > index 954e1a8..bbe3492 100644
> > --- a/kernel/sched_rt.c
> > +++ b/kernel/sched_rt.c
> > @@ -842,6 +842,7 @@ static void check_preempt_curr_rt(struct rq *rq,
> > struct task_struct *p, int sync
> >  		resched_task(rq->curr);
> >  		return;
> >  	}
> > +	return;
> >  
> >  #ifdef CONFIG_SMP
> >  	/*
> 
> That should not cause much of a problem if the scheduling task is not
> pinned to an CPU. But!!!!!
> 
> A recent change makes it expensive:


> +       if (!alloc_cpumask_var(&mask, GFP_ATOMIC))
>                 return;

> check_preempt_equal_prio is in a scheduling hot path!!!!!
> 
> WTF are we allocating there for?

I wasn't actually looking at the cost of the checks, even though they do
look higher (if they are using CONFIG_CPUMASK_OFFSTACK anyway).

The 2.6.24 code would trigger a rescheduling interrupt only when the
prio of the inbound task was higher than the running task.

This workload has a large number of equal priority rt tasks that are not
bound to a single CPU, and so I think it should trigger more
preempts/reschedules with the today's check_preempt_equal_prio().

-chris



^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-01-19 18:55                 ` Chris Mason
@ 2009-01-19 19:07                   ` Steven Rostedt
  0 siblings, 0 replies; 122+ messages in thread
From: Steven Rostedt @ 2009-01-19 19:07 UTC (permalink / raw)
  To: Chris Mason
  Cc: Ma, Chinang, Andrew Morton, Matthew Wilcox, Wilcox, Matthew R,
	linux-kernel, Tripathi, Sharad C, arjan, Kleen, Andi, Siddha,
	Suresh B, Chilukuri, Harita, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Gregory Haskins, Rusty Russell


On Mon, 2009-01-19 at 13:55 -0500, Chris Mason wrote:

> I wasn't actually looking at the cost of the checks, even though they do
> look higher (if they are using CONFIG_CPUMASK_OFFSTACK anyway).
> 
> The 2.6.24 code would trigger a rescheduling interrupt only when the
> prio of the inbound task was higher than the running task.
> 
> This workload has a large number of equal priority rt tasks that are not
> bound to a single CPU, and so I think it should trigger more
> preempts/reschedules with the today's check_preempt_equal_prio().

Ah yeah. This is one of the things that shows RT being more "responsive"
but less on performance. An RT task wants to run ASAP even if that means
there's a chance of more interrupts and higher cache misses.

The old way would be much faster in general through put, but I measured
RT tasks taking up to tens of milliseconds to get scheduled. This is
unacceptable for an RT task.

-- Steve



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19  7:43                     ` Nick Piggin
@ 2009-01-19 22:19                       ` Rick Jones
  0 siblings, 0 replies; 122+ messages in thread
From: Rick Jones @ 2009-01-19 22:19 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, netdev, sfr, matthew, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan, andi.kleen,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

>>>System is a 2socket, 4 core AMD.
>>
>>Not exactly a large system :)  Barely NUMA even with just two sockets.
> 
> 
> You're right ;)
> 
> But at least it is exercising the NUMA paths in the allocator, and
> represents a pretty common size of system...
> 
> I can run some tests on bigger systems at SUSE, but it is not always
> easy to set up "real" meaningful workloads on them or configure
> significant IO for them.

Not sure if I know enough git to pull your trees, or if this cobbler's child will 
have much in the way of bigger systems, but there is a chance I might - contact 
me offline with some pointers on how to pull and build the bits and such.

>>>Netperf UDP unidirectional send test (10 runs, higher better):
>>>
>>>Server and client bound to same CPU
>>>SLAB AVG=60.111 STD=1.59382
>>>SLQB AVG=60.167 STD=0.685347
>>>SLUB AVG=58.277 STD=0.788328
>>>
>>>Server and client bound to same socket, different CPUs
>>>SLAB AVG=85.938 STD=0.875794
>>>SLQB AVG=93.662 STD=2.07434
>>>SLUB AVG=81.983 STD=0.864362
>>>
>>>Server and client bound to different sockets
>>>SLAB AVG=78.801 STD=1.44118
>>>SLQB AVG=78.269 STD=1.10457
>>>SLUB AVG=71.334 STD=1.16809
>>>
>>
>> > ...
>>
>>>I haven't done any non-local network tests. Networking is the one of the
>>>subsystems most heavily dependent on slab performance, so if anybody
>>>cares to run their favourite tests, that would be really helpful.
>>
>>I'm guessing, but then are these Mbit/s figures? Would that be the sending
>>throughput or the receiving throughput?
> 
> 
> Yes, Mbit/s. They were... hmm, sending throughput I think, but each pair
> of numbers seemed to be identical IIRC?

Mega *bits* per second?  And those were 4K sends right?  That seems rather low 
for loopback - I would have expected nearly two orders of magnitude more.  I 
wonder if the intra-stack flow control kicked-in?  You might try adding test 
specific -S and -s options to set much larger socket buffers to try to avoid 
that.  Or simply use TCP.

netperf -H <foo> ... -- -s 1M -S 1M -m 4K

>>I love to see netperf used, but why UDP and loopback?
> 
> 
> No really good reason. I guess I was hoping to keep other variables as
> small as possible. But I guess a real remote test would be a lot more
> realistic as a networking test. Hmm, but I could probably set up a test
> over a simple GbE link here.  I'll try that.

If bandwidth is an issue, that is to say one saturates the link before much of 
anything "interesting" happens in the host you can use something like aggregate 
TCP_RR - ./configure with --enable_burst and then something like

netperf -H <remote> -t TCP_RR -- -D -b 32

and it will have as many as 33 discrete transactions in flight at one time on the 
one connection.  The -D is there to set TCP_NODELAY to preclude TCP chunking the 
single-byte (default, take your pick of a more reasonable size) transactions into 
one segment.

>>Also, how about the service demands?
> 
> 
> Well, over loopback and using CPU binding, I was hoping it wouldn't
> change much... 

Hope... but verify :)

> but I see netperf does some measurements for you. I
> will consider those in future too.
> 
> BTW. is it possible to do parallel netperf tests?

Yes, by (ab)using the confidence intervals code.  Poke around in 
http://www.netperf.org/svn/netperf2/doc/netperf.html in the "Aggregates" section, 
and I can go into further details offline (or here if folks want to see the 
discussion).

rick jones

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-19 18:37               ` Steven Rostedt
  2009-01-19 18:55                 ` Chris Mason
@ 2009-01-19 23:40                 ` Ingo Molnar
  1 sibling, 0 replies; 122+ messages in thread
From: Ingo Molnar @ 2009-01-19 23:40 UTC (permalink / raw)
  To: Steven Rostedt, Mike Travis, Rusty Russell
  Cc: Chris Mason, Ma, Chinang, Andrew Morton, Matthew Wilcox, Wilcox,
	Matthew R, linux-kernel, Tripathi, Sharad C, arjan, Kleen, Andi,
	Siddha, Suresh B, Chilukuri, Harita, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty, Ingo Molnar, Thomas Gleixner,
	Peter Zijlstra, Gregory Haskins, Rusty Russell


* Steven Rostedt <srostedt@redhat.com> wrote:

> (added Rusty)
> 
> On Mon, 2009-01-19 at 13:04 -0500, Chris Mason wrote:
> > On Thu, 2009-01-15 at 00:11 -0700, Ma, Chinang wrote:
> > > >> > > > >
> > > >> > > > > Linux OLTP Performance summary
> > > >> > > > > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%   idle%
> > > >iowait%
> > > >> > > > > 2.6.24.2                1.000   21969   43425   76   24     0
> > > >0
> > > >> > > > > 2.6.27.2                0.973   30402   43523   74   25     0
> > > >1
> > > >> > > > > 2.6.29-rc1              0.965   30331   41970   74   26     0
> > > >0
> > > >> >
> > > >> > > But the interrupt rate went through the roof.
> > > >> >
> > > >> > Yes.  I forget why that was; I'll have to dig through my archives for
> > > >> > that.
> > > >>
> > > >> Oh.  I'd have thought that this alone could account for 3.5%.
> > 
> > A later email indicated the reschedule interrupt count doubled since
> > 2.6.24, and so I poked around a bit at the causes of resched_task.
> > 
> > I think the -rt version of check_preempt_equal_prio has gotten much more
> > expensive since 2.6.24.
> > 
> > I'm sure these changes were made for good reasons, and this workload may
> > not be a good reason to change it back.  But, what does the patch below
> > do to performance on 2.6.29-rcX?
> > 
> > -chris
> > 
> > diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
> > index 954e1a8..bbe3492 100644
> > --- a/kernel/sched_rt.c
> > +++ b/kernel/sched_rt.c
> > @@ -842,6 +842,7 @@ static void check_preempt_curr_rt(struct rq *rq,
> > struct task_struct *p, int sync
> >  		resched_task(rq->curr);
> >  		return;
> >  	}
> > +	return;
> >  
> >  #ifdef CONFIG_SMP
> >  	/*
> 
> That should not cause much of a problem if the scheduling task is not
> pinned to an CPU. But!!!!!
> 
> A recent change makes it expensive:
> 
> commit 24600ce89a819a8f2fb4fd69fd777218a82ade20
> Author: Rusty Russell <rusty@rustcorp.com.au>
> Date:   Tue Nov 25 02:35:13 2008 +1030
> 
>     sched: convert check_preempt_equal_prio to cpumask_var_t.
>     
>     Impact: stack reduction for large NR_CPUS
> 
> 
> 
> which has:
> 
>  static void check_preempt_equal_prio(struct rq *rq, struct task_struct
> *p)
>  {
> -       cpumask_t mask;
> +       cpumask_var_t mask;
>  
>         if (rq->curr->rt.nr_cpus_allowed == 1)
>                 return;
>  
> -       if (p->rt.nr_cpus_allowed != 1
> -           && cpupri_find(&rq->rd->cpupri, p, &mask))
> +       if (!alloc_cpumask_var(&mask, GFP_ATOMIC))
>                 return;
> 
> 
> 
> 
> check_preempt_equal_prio is in a scheduling hot path!!!!!
> 
> WTF are we allocating there for?

Agreed - this needs to be fixed. Since this runs under the runqueue lock 
we can have a temporary cpumask in the runqueue itself, not on the stack.

	Ingo

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 10:20                       ` Andi Kleen
@ 2009-01-20  5:16                         ` Zhang, Yanmin
  2009-01-21 23:58                           ` Christoph Lameter
  0 siblings, 1 reply; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-20  5:16 UTC (permalink / raw)
  To: Andi Kleen, Christoph Lameter, Pekka Enberg
  Cc: Matthew Wilcox, Nick Piggin, Andrew Morton, netdev, sfr,
	matthew.r.wilcox, chinang.ma, linux-kernel, sharad.c.tripathi,
	arjan, suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

On Fri, 2009-01-16 at 11:20 +0100, Andi Kleen wrote:
> "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> writes:
> 
> 
> > I think that's because SLQB
> > doesn't pass through big object allocation to page allocator.
> > netperf UDP-U-1k has less improvement with SLQB.
> 
> That sounds like just the page allocator needs to be improved.
> That would help everyone. We talked a bit about this earlier,
> some of the heuristics for hot/cold pages are quite outdated
> and have been tuned for obsolete machines and also its fast path
> is quite long. Unfortunately no code currently.
Andi,

Thanks for your kind information. I did more investigation with SLUB
on netperf UDP-U-4k issue.

oprofile shows:
328058   30.1342  linux-2.6.29-rc2         copy_user_generic_string
134666   12.3699  linux-2.6.29-rc2         __free_pages_ok
125447   11.5231  linux-2.6.29-rc2         get_page_from_freelist
22611     2.0770  linux-2.6.29-rc2         __sk_mem_reclaim
21442     1.9696  linux-2.6.29-rc2         list_del
21187     1.9462  linux-2.6.29-rc2         __ip_route_output_key

So __free_pages_ok and get_page_from_freelist consume too much cpu time.
With SLQB, these 2 functions almost don't consume time.

Command 'slabinfo -AD' shows:
Name                   Objects    Alloc     Free   %Fast
:0000256                  1685 29611065 29609548  99  99
:0000168                  2987   164689   161859  94  39
:0004096                  1471   114918   113490  99  97

So kmem_cache :0000256 is very active.

Kernel stack dump in __free_pages_ok shows
 [<ffffffff8027010f>] __free_pages_ok+0x109/0x2e0
 [<ffffffff8024bb34>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8060f387>] __kfree_skb+0x9/0x6f
 [<ffffffff8061204b>] skb_free_datagram+0xc/0x31
 [<ffffffff8064b528>] udp_recvmsg+0x1e7/0x26f
 [<ffffffff8060b509>] sock_common_recvmsg+0x30/0x45
 [<ffffffff80609acd>] sock_recvmsg+0xd5/0xed

The callchain is:
__kfree_skb =>
	kfree_skbmem =>
		kmem_cache_free(skbuff_head_cache, skb);

kmem_cache skbuff_head_cache's object size is just 256, so it shares the kmem_cache
with :0000256. Their order is 1 which means every slab consists of 2 physical pages.

netperf UDP-U-4k is a UDP stream testing. client process keeps sending 4k-size packets
to server process and server process just receives the packets one by one.

If we start CPU_NUM clients and the same number of servers, every client will send lots
of packets within one sched slice, then process scheduler schedules the server to receive
many packets within one sched slice; then client resends again. So there are many packets
in the queue. When server receive the packets, it frees skbuff_head_cache. When the slab's
objects are all free, the slab will be released by calling __free_pages. Such batch
sending/receiving creates lots of slab free activity.

Page allocator has an array at zone_pcp(zone, cpu)->pcp to keep a page buffer for page order 0.
But here skbuff_head_cache's order is 1, so UDP-U-4k couldn't benefit from the page buffer.

SLQB has no such issue, because:
1) SLQB has a percpu freelist. Free objects are put to the list firstly and can be picked up
later on quickly without lock. A batch parameter to control the free object recollection is mostly
1024.
2) SLQB slab order mostly is 0, so although sometimes it calls alloc_pages/free_pages, it can
benefit from zone_pcp(zone, cpu)->pcp page buffer.

So SLUB need resolve such issues that one process allocates a batch of objects and another process
frees them batchly.

yanmin



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-16 18:14                       ` Gregory Haskins
  2009-01-16 19:09                         ` Steven Rostedt
@ 2009-01-20 12:45                         ` Gregory Haskins
  1 sibling, 0 replies; 122+ messages in thread
From: Gregory Haskins @ 2009-01-20 12:45 UTC (permalink / raw)
  To: Ma, Chinang
  Cc: Wilcox, Matthew R, Steven Rostedt, Matthew Wilcox, Andrew Morton,
	James Bottomley, linux-kernel, Tripathi, Sharad C, arjan, Kleen,
	Andi, Siddha, Suresh B, Chilukuri, Harita, Styner, Douglas W,
	Wang, Peter Xihong, Nueckel, Hubert, chris.mason, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

[-- Attachment #1: Type: text/plain, Size: 1535 bytes --]

Gregory Haskins wrote:
>
> Then, email the contents of /sys/kernel/debug/tracing/trace to me
>
>
>   

[ Chinang has performed the trace as requested, but replied with a
reduced CC to avoid spamming people with a large file.  This is
restoring the original list]


Ma, Chinang wrote:
> Hi Gregory,
> Trace in attachment. I trim down the distribution list. As the attachment is quite big.
>
> Thanks,
> -Chinang
>   
Hi Chinang,

  Thank you very much for taking the time to do this.  I have analyzed
the trace: I do not see any smoking gun w.r.t. the theory that we are
over IPI'ing the system.  There were holes in the data due to trace
limitations that rendered some of the data inconclusive.  However, the
places where we did not run into trace limitations looked like
everything was functioning as designed.

That being said, I do see that you have a ton of prio 48(ish) threads
that are over-straining the RT push logic.  The interesting thing here
is I recently pushed some patches to tip that have potential to help you
here.  Could you try your test using the sched/rt branch from -tip? 
Here is a clone link, for your convenience:

git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip.git sched/rt

For this run, do _not_ use the trace patch/config.  I just want to see
if you observe performance improvements with OLTP configured for RT prio
when compared to historic rt-push/pull based kernels (including HEAD on
linus.git, as tested in the last run)

Thanks!
-Greg



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-15  2:47           ` Matthew Wilcox
  2009-01-15  3:36             ` Andi Kleen
@ 2009-01-20 13:27             ` Jens Axboe
       [not found]               ` <588992150B702C48B3312184F1B810AD03A497632C@azsmsx501.amr.corp.intel.com>
  1 sibling, 1 reply; 122+ messages in thread
From: Jens Axboe @ 2009-01-20 13:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andi Kleen, Andrew Morton, Wilcox, Matthew R, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	Andrew Vasquez, Anirban Chakraborty

On Wed, Jan 14 2009, Matthew Wilcox wrote:
> On Thu, Jan 15, 2009 at 03:39:05AM +0100, Andi Kleen wrote:
> > Andrew Morton <akpm@linux-foundation.org> writes:
> > >>    some of that back, but not as much as taking them out (even when
> > >>    the sysctl'd variable is in a __read_mostly section).  We tried a
> > >>    patch from Jens to speed up the search for a new partition, but it
> > >>    had no effect.
> > >
> > > I find this surprising.
> > 
> > The test system has thousands of disks/LUNs which it writes to
> > all the time, in addition to a workload which is a real cache pig. 
> > So any increase in the per LUN overhead directly leads to a lot
> > more cache misses in the kernel because it increases the working set
> > there sigificantly.
> 
> This particular system has 450 spindles, but they're amalgamated into
> 30 logical volumes by the hardware or firmware.  Linux sees 30 LUNs.
> Each one, though, has fifteen partitions on it, so that brings us back
> up to 450 partitions.
> 
> This system, btw, is a scale model of the full system that would be used
> to get published results.  If I remember correctly, a 1% performance
> regression on this system is likely to translate to a 2% regression on
> the full-scale system.

Matthew, lets see if we can get this a little closer to disappearing. I
don't see lookup problems in the current kernel with the one-hit cache,
but perhaps it's either not getting enough hits in this bigger test case
or perhaps it's simply the rcu locking and preempt disables that build
up enough to cause a slowdown.

First things first, can you get a run of 2.6.29-rc2 with this patch?
It'll enable you to turn off per-partition stats in sysfs. I'd suggest
doing a run with a 2.6.29-rc2 booted with this patch, and then another
run with part_stats set to 0 for every exposed spindle. Then post those
profiles!

diff --git a/block/blk-core.c b/block/blk-core.c
index a824e49..6f693ae 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -600,7 +600,8 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 	q->prep_rq_fn		= NULL;
 	q->unplug_fn		= generic_unplug_device;
 	q->queue_flags		= (1 << QUEUE_FLAG_CLUSTER |
-				   1 << QUEUE_FLAG_STACKABLE);
+				   1 << QUEUE_FLAG_STACKABLE |
+				   1 << QUEUE_FLAG_PART_STAT);
 	q->queue_lock		= lock;
 
 	blk_queue_segment_boundary(q, BLK_SEG_BOUNDARY_MASK);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index a29cb78..a6ec2e3 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -158,6 +158,29 @@ static ssize_t queue_rq_affinity_show(struct request_queue *q, char *page)
 	return queue_var_show(set != 0, page);
 }
 
+static ssize_t queue_part_stat_store(struct request_queue *q, const char *page,
+				     size_t count)
+{
+	unsigned long nm;
+	ssize_t ret = queue_var_store(&nm, page, count);
+
+	spin_lock_irq(q->queue_lock);
+	if (nm)
+		queue_flag_set(QUEUE_FLAG_PART_STAT, q);
+	else
+		queue_flag_clear(QUEUE_FLAG_PART_STAT, q);
+
+	spin_unlock_irq(q->queue_lock);
+	return ret;
+}
+
+static ssize_t queue_part_stat_show(struct request_queue *q, char *page)
+{
+	unsigned int set = test_bit(QUEUE_FLAG_PART_STAT, &q->queue_flags);
+
+	return queue_var_show(set != 0, page);
+}
+
 static ssize_t
 queue_rq_affinity_store(struct request_queue *q, const char *page, size_t count)
 {
@@ -222,6 +245,12 @@ static struct queue_sysfs_entry queue_rq_affinity_entry = {
 	.store = queue_rq_affinity_store,
 };
 
+static struct queue_sysfs_entry queue_part_stat_entry = {
+	.attr = {.name = "part_stats", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_part_stat_show,
+	.store = queue_part_stat_store,
+};
+
 static struct attribute *default_attrs[] = {
 	&queue_requests_entry.attr,
 	&queue_ra_entry.attr,
@@ -231,6 +260,7 @@ static struct attribute *default_attrs[] = {
 	&queue_hw_sector_size_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
+	&queue_part_stat_entry.attr,
 	NULL,
 };
 
diff --git a/block/genhd.c b/block/genhd.c
index 397960c..09cbac2 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -208,6 +208,9 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
 	struct hd_struct *part;
 	int i;
 
+	if (!blk_queue_part_stat(disk->queue))
+		goto part0;
+
 	ptbl = rcu_dereference(disk->part_tbl);
 
 	part = rcu_dereference(ptbl->last_lookup);
@@ -222,6 +225,7 @@ struct hd_struct *disk_map_sector_rcu(struct gendisk *disk, sector_t sector)
 			return part;
 		}
 	}
+part0:
 	return &disk->part0;
 }
 EXPORT_SYMBOL_GPL(disk_map_sector_rcu);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 044467e..4d45842 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -449,6 +449,7 @@ struct request_queue
 #define QUEUE_FLAG_STACKABLE   13	/* supports request stacking */
 #define QUEUE_FLAG_NONROT      14	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT        QUEUE_FLAG_NONROT /* paravirt device */
+#define QUEUE_FLAG_PART_STAT   15	/* per-partition stats enabled */
 
 static inline int queue_is_locked(struct request_queue *q)
 {
@@ -568,6 +569,8 @@ enum {
 #define blk_queue_flushing(q)	((q)->ordseq)
 #define blk_queue_stackable(q)	\
 	test_bit(QUEUE_FLAG_STACKABLE, &(q)->queue_flags)
+#define blk_queue_part_stat(q)	\
+	test_bit(QUEUE_FLAG_PART_STAT, &(q)->queue_flags)
 
 #define blk_fs_request(rq)	((rq)->cmd_type == REQ_TYPE_FS)
 #define blk_pc_request(rq)	((rq)->cmd_type == REQ_TYPE_BLOCK_PC)

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-20  5:16                         ` Zhang, Yanmin
@ 2009-01-21 23:58                           ` Christoph Lameter
  2009-01-22  8:36                             ` Zhang, Yanmin
  0 siblings, 1 reply; 122+ messages in thread
From: Christoph Lameter @ 2009-01-21 23:58 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Andi Kleen, Pekka Enberg, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1708 bytes --]

On Tue, 20 Jan 2009, Zhang, Yanmin wrote:

> kmem_cache skbuff_head_cache's object size is just 256, so it shares the kmem_cache
> with :0000256. Their order is 1 which means every slab consists of 2 physical pages.

That order can be changed. Try specifying slub_max_order=0 on the kernel
command line to force an order 0 alloc.

The queues of the page allocator are of limited use due to their overhead.
Order-1 allocations can actually be 5% faster than order-0. order-0 makes
sense if pages are pushed rapidly to the page allocator and are then
reissues elsewhere. If there is a linear consumption then the page
allocator queues are just overhead.

> Page allocator has an array at zone_pcp(zone, cpu)->pcp to keep a page buffer for page order 0.
> But here skbuff_head_cache's order is 1, so UDP-U-4k couldn't benefit from the page buffer.

That usually does not matter because of partial list avoiding page
allocator actions.

> SLQB has no such issue, because:
> 1) SLQB has a percpu freelist. Free objects are put to the list firstly and can be picked up
> later on quickly without lock. A batch parameter to control the free object recollection is mostly
> 1024.
> 2) SLQB slab order mostly is 0, so although sometimes it calls alloc_pages/free_pages, it can
> benefit from zone_pcp(zone, cpu)->pcp page buffer.
>
> So SLUB need resolve such issues that one process allocates a batch of objects and another process
> frees them batchly.

SLUB has a percpu freelist but its bounded by the basic allocation unit.
You can increase that by modifying the allocation order. Writing a 3 or 5
into the order value in /sys/kernel/slab/xxx/order would do the trick.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-21 23:58                           ` Christoph Lameter
@ 2009-01-22  8:36                             ` Zhang, Yanmin
  2009-01-22  9:15                               ` Pekka Enberg
  0 siblings, 1 reply; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-22  8:36 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Pekka Enberg, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Wed, 2009-01-21 at 18:58 -0500, Christoph Lameter wrote:
> On Tue, 20 Jan 2009, Zhang, Yanmin wrote:
> 
> > kmem_cache skbuff_head_cache's object size is just 256, so it shares the kmem_cache
> > with :0000256. Their order is 1 which means every slab consists of 2 physical pages.
> 
> That order can be changed. Try specifying slub_max_order=0 on the kernel
> command line to force an order 0 alloc.
I tried slub_max_order=0 and there is no improvement on this UDP-U-4k issue.
Both get_page_from_freelist and __free_pages_ok's cpu time are still very high.

I checked my instrumentation in kernel and found it's caused by large object allocation/free
whose size is more than PAGE_SIZE. Here its order is 1.

The right free callchain is __kfree_skb => skb_release_all => skb_release_data.

So this case isn't the issue that batch of allocation/free might erase partial page
functionality.

'#slaninfo -AD' couldn't show statistics of large object allocation/free. Can we add
such info? That will be more helpful.

In addition, I didn't find such issue wih TCP stream testing.

> 
> The queues of the page allocator are of limited use due to their overhead.
> Order-1 allocations can actually be 5% faster than order-0. order-0 makes
> sense if pages are pushed rapidly to the page allocator and are then
> reissues elsewhere. If there is a linear consumption then the page
> allocator queues are just overhead.
> 
> > Page allocator has an array at zone_pcp(zone, cpu)->pcp to keep a page buffer for page order 0.
> > But here skbuff_head_cache's order is 1, so UDP-U-4k couldn't benefit from the page buffer.
> 
> That usually does not matter because of partial list avoiding page
> allocator actions.

> 
> > SLQB has no such issue, because:
> > 1) SLQB has a percpu freelist. Free objects are put to the list firstly and can be picked up
> > later on quickly without lock. A batch parameter to control the free object recollection is mostly
> > 1024.
> > 2) SLQB slab order mostly is 0, so although sometimes it calls alloc_pages/free_pages, it can
> > benefit from zone_pcp(zone, cpu)->pcp page buffer.
> >
> > So SLUB need resolve such issues that one process allocates a batch of objects and another process
> > frees them batchly.
> 
> SLUB has a percpu freelist but its bounded by the basic allocation unit.
> You can increase that by modifying the allocation order. Writing a 3 or 5
> into the order value in /sys/kernel/slab/xxx/order would do the trick.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-22  8:36                             ` Zhang, Yanmin
@ 2009-01-22  9:15                               ` Pekka Enberg
  2009-01-22  9:28                                 ` Zhang, Yanmin
  0 siblings, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-22  9:15 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Thu, 2009-01-22 at 16:36 +0800, Zhang, Yanmin wrote:
> On Wed, 2009-01-21 at 18:58 -0500, Christoph Lameter wrote:
> > On Tue, 20 Jan 2009, Zhang, Yanmin wrote:
> > 
> > > kmem_cache skbuff_head_cache's object size is just 256, so it shares the kmem_cache
> > > with :0000256. Their order is 1 which means every slab consists of 2 physical pages.
> > 
> > That order can be changed. Try specifying slub_max_order=0 on the kernel
> > command line to force an order 0 alloc.
> I tried slub_max_order=0 and there is no improvement on this UDP-U-4k issue.
> Both get_page_from_freelist and __free_pages_ok's cpu time are still very high.
> 
> I checked my instrumentation in kernel and found it's caused by large object allocation/free
> whose size is more than PAGE_SIZE. Here its order is 1.
> 
> The right free callchain is __kfree_skb => skb_release_all => skb_release_data.
> 
> So this case isn't the issue that batch of allocation/free might erase partial page
> functionality.

So is this the kfree(skb->head) in skb_release_data() or the put_page()
calls in the same function in a loop?

If it's the former, with big enough size passed to __alloc_skb(), the
networking code might be taking a hit from the SLUB page allocator
pass-through.

		Pekka


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-22  9:15                               ` Pekka Enberg
@ 2009-01-22  9:28                                 ` Zhang, Yanmin
  2009-01-22  9:47                                   ` Pekka Enberg
  0 siblings, 1 reply; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-22  9:28 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Thu, 2009-01-22 at 11:15 +0200, Pekka Enberg wrote:
> On Thu, 2009-01-22 at 16:36 +0800, Zhang, Yanmin wrote:
> > On Wed, 2009-01-21 at 18:58 -0500, Christoph Lameter wrote:
> > > On Tue, 20 Jan 2009, Zhang, Yanmin wrote:
> > > 
> > > > kmem_cache skbuff_head_cache's object size is just 256, so it shares the kmem_cache
> > > > with :0000256. Their order is 1 which means every slab consists of 2 physical pages.
> > > 
> > > That order can be changed. Try specifying slub_max_order=0 on the kernel
> > > command line to force an order 0 alloc.
> > I tried slub_max_order=0 and there is no improvement on this UDP-U-4k issue.
> > Both get_page_from_freelist and __free_pages_ok's cpu time are still very high.
> > 
> > I checked my instrumentation in kernel and found it's caused by large object allocation/free
> > whose size is more than PAGE_SIZE. Here its order is 1.
> > 
> > The right free callchain is __kfree_skb => skb_release_all => skb_release_data.
> > 
> > So this case isn't the issue that batch of allocation/free might erase partial page
> > functionality.
> 
> So is this the kfree(skb->head) in skb_release_data() or the put_page()
> calls in the same function in a loop?
It's kfree(skb->head).

> 
> If it's the former, with big enough size passed to __alloc_skb(), the
> networking code might be taking a hit from the SLUB page allocator
> pass-through.



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-22  9:28                                 ` Zhang, Yanmin
@ 2009-01-22  9:47                                   ` Pekka Enberg
  2009-01-23  3:02                                     ` Zhang, Yanmin
  0 siblings, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-22  9:47 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Thu, 2009-01-22 at 17:28 +0800, Zhang, Yanmin wrote:
> On Thu, 2009-01-22 at 11:15 +0200, Pekka Enberg wrote:
> > On Thu, 2009-01-22 at 16:36 +0800, Zhang, Yanmin wrote:
> > > On Wed, 2009-01-21 at 18:58 -0500, Christoph Lameter wrote:
> > > > On Tue, 20 Jan 2009, Zhang, Yanmin wrote:
> > > > 
> > > > > kmem_cache skbuff_head_cache's object size is just 256, so it shares the kmem_cache
> > > > > with :0000256. Their order is 1 which means every slab consists of 2 physical pages.
> > > > 
> > > > That order can be changed. Try specifying slub_max_order=0 on the kernel
> > > > command line to force an order 0 alloc.
> > > I tried slub_max_order=0 and there is no improvement on this UDP-U-4k issue.
> > > Both get_page_from_freelist and __free_pages_ok's cpu time are still very high.
> > > 
> > > I checked my instrumentation in kernel and found it's caused by large object allocation/free
> > > whose size is more than PAGE_SIZE. Here its order is 1.
> > > 
> > > The right free callchain is __kfree_skb => skb_release_all => skb_release_data.
> > > 
> > > So this case isn't the issue that batch of allocation/free might erase partial page
> > > functionality.
> > 
> > So is this the kfree(skb->head) in skb_release_data() or the put_page()
> > calls in the same function in a loop?
> It's kfree(skb->head).
> 
> > 
> > If it's the former, with big enough size passed to __alloc_skb(), the
> > networking code might be taking a hit from the SLUB page allocator
> > pass-through.

Do we know what kind of size is being passed to __alloc_skb() in this
case? Maybe we want to do something like this.

		Pekka

SLUB: revert page allocator pass-through

This is a revert of commit aadb4bc4a1f9108c1d0fbd121827c936c2ed4217 ("SLUB:
direct pass through of page size or higher kmalloc requests").
---

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 2f5c16b..3bd3662 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -124,7 +124,7 @@ struct kmem_cache {
  * We keep the general caches in an array of slab caches that are used for
  * 2^x bytes of allocations.
  */
-extern struct kmem_cache kmalloc_caches[PAGE_SHIFT + 1];
+extern struct kmem_cache kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
 
 /*
  * Sorry that the following has to be that ugly but some versions of GCC
@@ -135,6 +135,9 @@ static __always_inline int kmalloc_index(size_t size)
 	if (!size)
 		return 0;
 
+	if (size > KMALLOC_MAX_SIZE)
+		return -1;
+
 	if (size <= KMALLOC_MIN_SIZE)
 		return KMALLOC_SHIFT_LOW;
 
@@ -154,10 +157,6 @@ static __always_inline int kmalloc_index(size_t size)
 	if (size <=       1024) return 10;
 	if (size <=   2 * 1024) return 11;
 	if (size <=   4 * 1024) return 12;
-/*
- * The following is only needed to support architectures with a larger page
- * size than 4k.
- */
 	if (size <=   8 * 1024) return 13;
 	if (size <=  16 * 1024) return 14;
 	if (size <=  32 * 1024) return 15;
@@ -167,6 +166,10 @@ static __always_inline int kmalloc_index(size_t size)
 	if (size <= 512 * 1024) return 19;
 	if (size <= 1024 * 1024) return 20;
 	if (size <=  2 * 1024 * 1024) return 21;
+	if (size <=  4 * 1024 * 1024) return 22;
+	if (size <=  8 * 1024 * 1024) return 23;
+	if (size <= 16 * 1024 * 1024) return 24;
+	if (size <= 32 * 1024 * 1024) return 25;
 	return -1;
 
 /*
@@ -191,6 +194,19 @@ static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
 	if (index == 0)
 		return NULL;
 
+	/*
+	 * This function only gets expanded if __builtin_constant_p(size), so
+	 * testing it here shouldn't be needed.  But some versions of gcc need
+	 * help.
+	 */
+	if (__builtin_constant_p(size) && index < 0) {
+		/*
+		 * Generate a link failure. Would be great if we could
+		 * do something to stop the compile here.
+		 */
+		extern void __kmalloc_size_too_large(void);
+		__kmalloc_size_too_large();
+	}
 	return &kmalloc_caches[index];
 }
 
@@ -204,17 +220,9 @@ static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
 void *kmem_cache_alloc(struct kmem_cache *, gfp_t);
 void *__kmalloc(size_t size, gfp_t flags);
 
-static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
-{
-	return (void *)__get_free_pages(flags | __GFP_COMP, get_order(size));
-}
-
 static __always_inline void *kmalloc(size_t size, gfp_t flags)
 {
 	if (__builtin_constant_p(size)) {
-		if (size > PAGE_SIZE)
-			return kmalloc_large(size, flags);
-
 		if (!(flags & SLUB_DMA)) {
 			struct kmem_cache *s = kmalloc_slab(size);
 
diff --git a/mm/slub.c b/mm/slub.c
index 6392ae5..8fad23f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2475,7 +2475,7 @@ EXPORT_SYMBOL(kmem_cache_destroy);
  *		Kmalloc subsystem
  *******************************************************************/
 
-struct kmem_cache kmalloc_caches[PAGE_SHIFT + 1] __cacheline_aligned;
+struct kmem_cache kmalloc_caches[KMALLOC_SHIFT_HIGH + 1] __cacheline_aligned;
 EXPORT_SYMBOL(kmalloc_caches);
 
 static int __init setup_slub_min_order(char *str)
@@ -2537,7 +2537,7 @@ panic:
 }
 
 #ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_caches_dma[PAGE_SHIFT + 1];
+static struct kmem_cache *kmalloc_caches_dma[KMALLOC_SHIFT_HIGH + 1];
 
 static void sysfs_add_func(struct work_struct *w)
 {
@@ -2643,8 +2643,12 @@ static struct kmem_cache *get_slab(size_t size, gfp_t flags)
 			return ZERO_SIZE_PTR;
 
 		index = size_index[(size - 1) / 8];
-	} else
+	} else {
+		if (size > KMALLOC_MAX_SIZE)
+			return NULL;
+
 		index = fls(size - 1);
+	}
 
 #ifdef CONFIG_ZONE_DMA
 	if (unlikely((flags & SLUB_DMA)))
@@ -2658,9 +2662,6 @@ void *__kmalloc(size_t size, gfp_t flags)
 {
 	struct kmem_cache *s;
 
-	if (unlikely(size > PAGE_SIZE))
-		return kmalloc_large(size, flags);
-
 	s = get_slab(size, flags);
 
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
@@ -2670,25 +2671,11 @@ void *__kmalloc(size_t size, gfp_t flags)
 }
 EXPORT_SYMBOL(__kmalloc);
 
-static void *kmalloc_large_node(size_t size, gfp_t flags, int node)
-{
-	struct page *page = alloc_pages_node(node, flags | __GFP_COMP,
-						get_order(size));
-
-	if (page)
-		return page_address(page);
-	else
-		return NULL;
-}
-
 #ifdef CONFIG_NUMA
 void *__kmalloc_node(size_t size, gfp_t flags, int node)
 {
 	struct kmem_cache *s;
 
-	if (unlikely(size > PAGE_SIZE))
-		return kmalloc_large_node(size, flags, node);
-
 	s = get_slab(size, flags);
 
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
@@ -2746,11 +2733,8 @@ void kfree(const void *x)
 		return;
 
 	page = virt_to_head_page(x);
-	if (unlikely(!PageSlab(page))) {
-		BUG_ON(!PageCompound(page));
-		put_page(page);
+	if (unlikely(WARN_ON(!PageSlab(page)))) /* XXX */
 		return;
-	}
 	slab_free(page->slab, page, object, _RET_IP_);
 }
 EXPORT_SYMBOL(kfree);
@@ -2985,7 +2969,7 @@ void __init kmem_cache_init(void)
 		caches++;
 	}
 
-	for (i = KMALLOC_SHIFT_LOW; i <= PAGE_SHIFT; i++) {
+	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) {
 		create_kmalloc_cache(&kmalloc_caches[i],
 			"kmalloc", 1 << i, GFP_KERNEL);
 		caches++;
@@ -3022,7 +3006,7 @@ void __init kmem_cache_init(void)
 	slab_state = UP;
 
 	/* Provide the correct kmalloc names now that the caches are up */
-	for (i = KMALLOC_SHIFT_LOW; i <= PAGE_SHIFT; i++)
+	for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++)
 		kmalloc_caches[i]. name =
 			kasprintf(GFP_KERNEL, "kmalloc-%d", 1 << i);
 
@@ -3222,9 +3206,6 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
 {
 	struct kmem_cache *s;
 
-	if (unlikely(size > PAGE_SIZE))
-		return kmalloc_large(size, gfpflags);
-
 	s = get_slab(size, gfpflags);
 
 	if (unlikely(ZERO_OR_NULL_PTR(s)))
@@ -3238,9 +3219,6 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
 {
 	struct kmem_cache *s;
 
-	if (unlikely(size > PAGE_SIZE))
-		return kmalloc_large_node(size, gfpflags, node);
-
 	s = get_slab(size, gfpflags);
 
 	if (unlikely(ZERO_OR_NULL_PTR(s)))



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
       [not found]               ` <588992150B702C48B3312184F1B810AD03A497632C@azsmsx501.amr.corp.intel.com>
@ 2009-01-22 11:29                 ` Jens Axboe
       [not found]                   ` <588992150B702C48B3312184F1B810AD03A4F59632@azsmsx501.amr.corp.intel.com>
  0 siblings, 1 reply; 122+ messages in thread
From: Jens Axboe @ 2009-01-22 11:29 UTC (permalink / raw)
  To: Chilukuri, Harita
  Cc: Matthew Wilcox, Andi Kleen, Andrew Morton, Wilcox, Matthew R, Ma,
	Chinang, linux-kernel, Tripathi, Sharad C, arjan, Siddha,
	Suresh B, Styner, Douglas W, Wang, Peter Xihong, Nueckel, Hubert,
	chris.mason, srostedt, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty

On Wed, Jan 21 2009, Chilukuri, Harita wrote:
> Jen, we work with Matthew on the OLTP workload and have tested the part_stats patch on 2.6.29-rc2. Below are the details:
> 
> Disabling the part_stats has positive impact on the OLTP workload.
> 
> Linux OLTP Performance summary
> Kernel#                           Speedup(x) Intr/s  CtxSw/s us%  sys% idle% iowait%
> 2.6.29-rc2-part_stats                1.000   30329   41716   74    26   0       0
> 2.6.29-rc2-disable-part_stats        1.006   30413   42582   74    25   0       0
> 
> Server configurations:
> Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
> 
> 
> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.29-rc2-part_stats      Cycles% 2.6.29-rc2-disable-part_stats
> 0.9634 qla24xx_intr_handler        1.0372 qla24xx_intr_handler
> 0.9057 copy_user_generic_string    0.7461 qla24xx_wrt_req_reg
> 0.7583 unmap_vmas                  0.7130 kmem_cache_alloc
> 0.6280 qla24xx_wrt_req_reg         0.6876 copy_user_generic_string
> 0.6088 kmem_cache_alloc            0.5656 qla24xx_start_scsi
> 0.5468 clear_page_c                0.4881 __blockdev_direct_IO
> 0.5191 qla24xx_start_scsi          0.4728 try_to_wake_up
> 0.4892 try_to_wake_up              0.4588 unmap_vmas
> 0.4870 __blockdev_direct_IO        0.4360 scsi_request_fn
> 0.4187 scsi_request_fn             0.3711 __switch_to
> 0.3717 __switch_to                 0.3699 aio_complete
> 0.3567 rb_get_reader_page          0.3648 rb_get_reader_page
> 0.3396 aio_complete                0.3597 ring_buffer_consume
> 0.3012 __end_that_request_first    0.3292 memset_c
> 0.2926 memset_c                    0.3076 __list_add
> 0.2926 ring_buffer_consume         0.2771 clear_page_c
> 0.2884 page_remove_rmap            0.2745 task_rq_lock
> 0.2691 disk_map_sector_rcu         0.2733 generic_make_request
> 0.2670 copy_page_c                 0.2555 tcp_sendmsg
> 0.2670 lock_timer_base             0.2529 qla2x00_process_completed_re
> 0.2606 qla2x00_process_completed_re0.2440 e1000_xmit_frame
> 0.2521 task_rq_lock                0.2390 lock_timer_base
> 0.2328 __list_add                  0.2364 qla24xx_queuecommand
> 0.2286 generic_make_request        0.2301 kmem_cache_free
> 0.2286 pick_next_highest_task_rt   0.2262 blk_queue_end_tag
> 0.2136 push_rt_task                0.2262 kref_get
> 0.2115 blk_queue_end_tag           0.2250 push_rt_task
> 0.2115 kmem_cache_free             0.2135 scsi_dispatch_cmd
> 0.2051 e1000_xmit_frame            0.2084 sd_prep_fn
> 0.2051 scsi_device_unbusy          0.2059 kfree

Alright, so that 0.6%. IIRC, 0.1% (or there abouts) is significant with
this benchmark, correct? To get a feel for the rest of the accounting
overhead, could you try with this patch that just disables the whole
thing?

diff --git a/block/blk-core.c b/block/blk-core.c
index a824e49..eec9126 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -64,6 +64,7 @@ static struct workqueue_struct *kblockd_workqueue;
 
 static void drive_stat_acct(struct request *rq, int new_io)
 {
+#if 0
 	struct hd_struct *part;
 	int rw = rq_data_dir(rq);
 	int cpu;
@@ -82,6 +83,7 @@ static void drive_stat_acct(struct request *rq, int new_io)
 	}
 
 	part_stat_unlock();
+#endif
 }
 
 void blk_queue_congestion_threshold(struct request_queue *q)
@@ -1014,6 +1017,7 @@ static inline void add_request(struct request_queue *q, struct request *req)
 	__elv_add_request(q, req, ELEVATOR_INSERT_SORT, 0);
 }
 
+#if 0
 static void part_round_stats_single(int cpu, struct hd_struct *part,
 				    unsigned long now)
 {
@@ -1027,6 +1031,7 @@ static void part_round_stats_single(int cpu, struct hd_struct *part,
 	}
 	part->stamp = now;
 }
+#endif
 
 /**
  * part_round_stats() - Round off the performance stats on a struct disk_stats.
@@ -1046,11 +1051,13 @@ static void part_round_stats_single(int cpu, struct hd_struct *part,
  */
 void part_round_stats(int cpu, struct hd_struct *part)
 {
+#if 0
 	unsigned long now = jiffies;
 
 	if (part->partno)
 		part_round_stats_single(cpu, &part_to_disk(part)->part0, now);
 	part_round_stats_single(cpu, part, now);
+#endif
 }
 EXPORT_SYMBOL_GPL(part_round_stats);
 
@@ -1690,6 +1697,7 @@ static int __end_that_request_first(struct request *req, int error,
 				(unsigned long long)req->sector);
 	}
 
+#if 0
 	if (blk_fs_request(req) && req->rq_disk) {
 		const int rw = rq_data_dir(req);
 		struct hd_struct *part;
@@ -1700,6 +1708,7 @@ static int __end_that_request_first(struct request *req, int error,
 		part_stat_add(cpu, part, sectors[rw], nr_bytes >> 9);
 		part_stat_unlock();
 	}
+#endif
 
 	total_bytes = bio_nbytes = 0;
 	while ((bio = req->bio) != NULL) {
@@ -1779,7 +1788,9 @@ static int __end_that_request_first(struct request *req, int error,
  */
 static void end_that_request_last(struct request *req, int error)
 {
+#if 0
 	struct gendisk *disk = req->rq_disk;
+#endif
 
 	if (blk_rq_tagged(req))
 		blk_queue_end_tag(req->q, req);
@@ -1797,6 +1808,7 @@ static void end_that_request_last(struct request *req, int error)
 	 * IO on queueing nor completion.  Accounting the containing
 	 * request is enough.
 	 */
+#if 0
 	if (disk && blk_fs_request(req) && req != &req->q->bar_rq) {
 		unsigned long duration = jiffies - req->start_time;
 		const int rw = rq_data_dir(req);
@@ -1813,6 +1825,7 @@ static void end_that_request_last(struct request *req, int error)
 
 		part_stat_unlock();
 	}
+#endif
 
 	if (req->end_io)
 		req->end_io(req, error);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-22  9:47                                   ` Pekka Enberg
@ 2009-01-23  3:02                                     ` Zhang, Yanmin
  2009-01-23  6:52                                       ` Pekka Enberg
  2009-01-23  8:33                                       ` Nick Piggin
  0 siblings, 2 replies; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-23  3:02 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Thu, 2009-01-22 at 11:47 +0200, Pekka Enberg wrote:
> On Thu, 2009-01-22 at 17:28 +0800, Zhang, Yanmin wrote:
> > On Thu, 2009-01-22 at 11:15 +0200, Pekka Enberg wrote:
> > > On Thu, 2009-01-22 at 16:36 +0800, Zhang, Yanmin wrote:
> > > > On Wed, 2009-01-21 at 18:58 -0500, Christoph Lameter wrote:
> > > > > On Tue, 20 Jan 2009, Zhang, Yanmin wrote:
> > > > > 
> > > > > > kmem_cache skbuff_head_cache's object size is just 256, so it shares the kmem_cache
> > > > > > with :0000256. Their order is 1 which means every slab consists of 2 physical pages.
> > > > > 
> > > > > That order can be changed. Try specifying slub_max_order=0 on the kernel
> > > > > command line to force an order 0 alloc.
> > > > I tried slub_max_order=0 and there is no improvement on this UDP-U-4k issue.
> > > > Both get_page_from_freelist and __free_pages_ok's cpu time are still very high.
> > > > 
> > > > I checked my instrumentation in kernel and found it's caused by large object allocation/free
> > > > whose size is more than PAGE_SIZE. Here its order is 1.
> > > > 
> > > > The right free callchain is __kfree_skb => skb_release_all => skb_release_data.
> > > > 
> > > > So this case isn't the issue that batch of allocation/free might erase partial page
> > > > functionality.
> > > 
> > > So is this the kfree(skb->head) in skb_release_data() or the put_page()
> > > calls in the same function in a loop?
> > It's kfree(skb->head).
> > 
> > > 
> > > If it's the former, with big enough size passed to __alloc_skb(), the
> > > networking code might be taking a hit from the SLUB page allocator
> > > pass-through.
> 
> Do we know what kind of size is being passed to __alloc_skb() in this
> case?
In function __alloc_skb, original parameter size=4155,
SKB_DATA_ALIGN(size)=4224, sizeof(struct skb_shared_info)=472, so
__kmalloc_track_caller's parameter size=4696.

>  Maybe we want to do something like this.
> 
> 		Pekka
> 
> SLUB: revert page allocator pass-through
This patch amost fixes the netperf UDP-U-4k issue.

#slabinfo -AD
Name                   Objects    Alloc     Free   %Fast
:0000256                  1658 70350463 70348946  99  99 
kmalloc-8192                31 70322309 70322293  99  99 
:0000168                  2592   143154   140684  93  28 
:0004096                  1456    91072    89644  99  96 
:0000192                  3402    63838    60491  89  11 
:0000064                  6177    49635    43743  98  77 

So kmalloc-8192 appears. Without the patch, kmalloc-8192 hides.
kmalloc-8192's default order on my 8-core stoakley is 2.

1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better than SLQB's;
2) If I start 1 clinet and 1 server, and bind them to different physical cpu, SLQB's result
is about 10% better than SLUB's.

I don't know why there is still 10% difference with item 2). Maybe cachemiss causes it?

> 
> This is a revert of commit aadb4bc4a1f9108c1d0fbd121827c936c2ed4217 ("SLUB:
> direct pass through of page size or higher kmalloc requests").
> ---
> 
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index 2f5c16b..3bd3662 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23  3:02                                     ` Zhang, Yanmin
@ 2009-01-23  6:52                                       ` Pekka Enberg
  2009-01-23  8:06                                         ` Pekka Enberg
  2009-01-23  8:33                                       ` Nick Piggin
  1 sibling, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-23  6:52 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty, mingo

Zhang, Yanmin wrote:
>>>> If it's the former, with big enough size passed to __alloc_skb(), the
>>>> networking code might be taking a hit from the SLUB page allocator
>>>> pass-through.
>> Do we know what kind of size is being passed to __alloc_skb() in this
>> case?
> In function __alloc_skb, original parameter size=4155,
> SKB_DATA_ALIGN(size)=4224, sizeof(struct skb_shared_info)=472, so
> __kmalloc_track_caller's parameter size=4696.

OK, so all allocations go straight to the page allocator.

> 
>>  Maybe we want to do something like this.
>>
>> SLUB: revert page allocator pass-through
> This patch amost fixes the netperf UDP-U-4k issue.
> 
> #slabinfo -AD
> Name                   Objects    Alloc     Free   %Fast
> :0000256                  1658 70350463 70348946  99  99 
> kmalloc-8192                31 70322309 70322293  99  99 
> :0000168                  2592   143154   140684  93  28 
> :0004096                  1456    91072    89644  99  96 
> :0000192                  3402    63838    60491  89  11 
> :0000064                  6177    49635    43743  98  77 
> 
> So kmalloc-8192 appears. Without the patch, kmalloc-8192 hides.
> kmalloc-8192's default order on my 8-core stoakley is 2.

Christoph, should we merge my patch as-is or do you have an alternative 
fix in mind? We could, of course, increase kmalloc() caches one level up 
to 8192 or higher.

> 
> 1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better than SLQB's;
> 2) If I start 1 clinet and 1 server, and bind them to different physical cpu, SLQB's result
> is about 10% better than SLUB's.
> 
> I don't know why there is still 10% difference with item 2). Maybe cachemiss causes it?

Maybe we can use the perfstat and/or kerneltop utilities of the new perf 
counters patch to diagnose this:

http://lkml.org/lkml/2009/1/21/273

And do oprofile, of course. Thanks!

		Pekka

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23  6:52                                       ` Pekka Enberg
@ 2009-01-23  8:06                                         ` Pekka Enberg
  2009-01-23  8:30                                           ` Zhang, Yanmin
  0 siblings, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-23  8:06 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty, mingo

On Fri, 2009-01-23 at 08:52 +0200, Pekka Enberg wrote:
> > 1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better than SLQB's;
> > 2) If I start 1 clinet and 1 server, and bind them to different physical cpu, SLQB's result
> > is about 10% better than SLUB's.
> > 
> > I don't know why there is still 10% difference with item 2). Maybe cachemiss causes it?
> 
> Maybe we can use the perfstat and/or kerneltop utilities of the new perf 
> counters patch to diagnose this:
> 
> http://lkml.org/lkml/2009/1/21/273
> 
> And do oprofile, of course. Thanks!

I assume binding the client and the server to different physical CPUs
also  means that the SKB is always allocated on CPU 1 and freed on CPU
2? If so, we will be taking the __slab_free() slow path all the time on
kfree() which will cause cache effects, no doubt.

But there's another potential performance hit we're taking because the
object size of the cache is so big. As allocations from CPU 1 keep
coming in, we need to allocate new pages and unfreeze the per-cpu page.
That in turn causes __slab_free() to be more eager to discard the slab
(see the PageSlubFrozen check there).

So before going for cache profiling, I'd really like to see an oprofile
report. I suspect we're still going to see much more page allocator
activity there than with SLAB or SLQB which is why we're still behaving
so badly here.

		Pekka


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23  8:06                                         ` Pekka Enberg
@ 2009-01-23  8:30                                           ` Zhang, Yanmin
  2009-01-23  8:40                                             ` Pekka Enberg
  2009-01-23  9:46                                             ` Pekka Enberg
  0 siblings, 2 replies; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-23  8:30 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty, mingo

On Fri, 2009-01-23 at 10:06 +0200, Pekka Enberg wrote:
> On Fri, 2009-01-23 at 08:52 +0200, Pekka Enberg wrote:
> > > 1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better than SLQB's;
> > > 2) If I start 1 clinet and 1 server, and bind them to different physical cpu, SLQB's result
> > > is about 10% better than SLUB's.
> > > 
> > > I don't know why there is still 10% difference with item 2). Maybe cachemiss causes it?
> > 
> > Maybe we can use the perfstat and/or kerneltop utilities of the new perf 
> > counters patch to diagnose this:
> > 
> > http://lkml.org/lkml/2009/1/21/273
> > 
> > And do oprofile, of course. Thanks!
> 
> I assume binding the client and the server to different physical CPUs
> also  means that the SKB is always allocated on CPU 1 and freed on CPU
> 2? If so, we will be taking the __slab_free() slow path all the time on
> kfree() which will cause cache effects, no doubt.
> 
> But there's another potential performance hit we're taking because the
> object size of the cache is so big. As allocations from CPU 1 keep
> coming in, we need to allocate new pages and unfreeze the per-cpu page.
> That in turn causes __slab_free() to be more eager to discard the slab
> (see the PageSlubFrozen check there).
> 
> So before going for cache profiling, I'd really like to see an oprofile
> report. I suspect we're still going to see much more page allocator
> activity
Theoretically, it should, but oprofile doesn't show that.

>  there than with SLAB or SLQB which is why we're still behaving
> so badly here.

oprofile output with 2.6.29-rc2-slubrevertlarge:
CPU: Core 2, speed 2666.71 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        app name                 symbol name
132779   32.9951  vmlinux                  copy_user_generic_string
25334     6.2954  vmlinux                  schedule
21032     5.2264  vmlinux                  tg_shares_up
17175     4.2679  vmlinux                  __skb_recv_datagram
9091      2.2591  vmlinux                  sock_def_readable
8934      2.2201  vmlinux                  mwait_idle
8796      2.1858  vmlinux                  try_to_wake_up
6940      1.7246  vmlinux                  __slab_free

#slaninfo -AD
Name                   Objects    Alloc     Free   %Fast
:0000256                  1643  5215544  5214027  94   0 
kmalloc-8192                28  5189576  5189560   0   0 
:0000168                  2631   141466   138976  92  28 
:0004096                  1452    88697    87269  99  96 
:0000192                  3402    63050    59732  89  11 
:0000064                  6265    46611    40721  98  82 
:0000128                  1895    30429    28654  93  32 


oprofile output with kernel 2.6.29-rc2-slqb0121:
CPU: Core 2, speed 2666.76 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        image name               app name                 symbol name
114793   28.7163  vmlinux                  vmlinux                  copy_user_generic_string
27880     6.9744  vmlinux                  vmlinux                  tg_shares_up
22218     5.5580  vmlinux                  vmlinux                  schedule
12238     3.0614  vmlinux                  vmlinux                  mwait_idle
7395      1.8499  vmlinux                  vmlinux                  task_rq_lock
7348      1.8382  vmlinux                  vmlinux                  sock_def_readable
7202      1.8016  vmlinux                  vmlinux                  sched_clock_cpu
6981      1.7464  vmlinux                  vmlinux                  __skb_recv_datagram
6566      1.6425  vmlinux                  vmlinux                  udp_queue_rcv_skb



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23  3:02                                     ` Zhang, Yanmin
  2009-01-23  6:52                                       ` Pekka Enberg
@ 2009-01-23  8:33                                       ` Nick Piggin
  2009-01-23  9:02                                         ` Zhang, Yanmin
  1 sibling, 1 reply; 122+ messages in thread
From: Nick Piggin @ 2009-01-23  8:33 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Pekka Enberg, Christoph Lameter, Andi Kleen, Matthew Wilcox,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

On Friday 23 January 2009 14:02:53 Zhang, Yanmin wrote:

> 1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better
> than SLQB's;

I'll have to look into this too. Could be evidence of the possible
TLB improvement from using bigger pages and/or page-specific freelist,
I suppose.

Do you have a scripted used to start netperf in that configuration?


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23  8:30                                           ` Zhang, Yanmin
@ 2009-01-23  8:40                                             ` Pekka Enberg
  2009-01-23  9:46                                             ` Pekka Enberg
  1 sibling, 0 replies; 122+ messages in thread
From: Pekka Enberg @ 2009-01-23  8:40 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty, mingo

On Fri, 2009-01-23 at 16:30 +0800, Zhang, Yanmin wrote:
> > I assume binding the client and the server to different physical CPUs
> > also  means that the SKB is always allocated on CPU 1 and freed on CPU
> > 2? If so, we will be taking the __slab_free() slow path all the time on
> > kfree() which will cause cache effects, no doubt.
> > 
> > But there's another potential performance hit we're taking because the
> > object size of the cache is so big. As allocations from CPU 1 keep
> > coming in, we need to allocate new pages and unfreeze the per-cpu page.
> > That in turn causes __slab_free() to be more eager to discard the slab
> > (see the PageSlubFrozen check there).
> > 
> > So before going for cache profiling, I'd really like to see an oprofile
> > report. I suspect we're still going to see much more page allocator
> > activity
> Theoretically, it should, but oprofile doesn't show that.
> 
> > there than with SLAB or SLQB which is why we're still behaving
> > so badly here.
> 
> oprofile output with 2.6.29-rc2-slubrevertlarge:
> CPU: Core 2, speed 2666.71 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
> samples  %        app name                 symbol name
> 132779   32.9951  vmlinux                  copy_user_generic_string
> 25334     6.2954  vmlinux                  schedule
> 21032     5.2264  vmlinux                  tg_shares_up
> 17175     4.2679  vmlinux                  __skb_recv_datagram
> 9091      2.2591  vmlinux                  sock_def_readable
> 8934      2.2201  vmlinux                  mwait_idle
> 8796      2.1858  vmlinux                  try_to_wake_up
> 6940      1.7246  vmlinux                  __slab_free
> 
> #slaninfo -AD
> Name                   Objects    Alloc     Free   %Fast
> :0000256                  1643  5215544  5214027  94   0 
> kmalloc-8192                28  5189576  5189560   0   0 
                                                    ^^^^^^

This looks bit funny. Hmm.

> :0000168                  2631   141466   138976  92  28 
> :0004096                  1452    88697    87269  99  96 
> :0000192                  3402    63050    59732  89  11 
> :0000064                  6265    46611    40721  98  82 
> :0000128                  1895    30429    28654  93  32 



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23  8:33                                       ` Nick Piggin
@ 2009-01-23  9:02                                         ` Zhang, Yanmin
  2009-01-23 18:40                                           ` care and feeding of netperf (Re: Mainline kernel OLTP performance update) Rick Jones
  0 siblings, 1 reply; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-23  9:02 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Pekka Enberg, Christoph Lameter, Andi Kleen, Matthew Wilcox,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

On Fri, 2009-01-23 at 19:33 +1100, Nick Piggin wrote:
> On Friday 23 January 2009 14:02:53 Zhang, Yanmin wrote:
> 
> > 1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better
> > than SLQB's;
> 
> I'll have to look into this too. Could be evidence of the possible
> TLB improvement from using bigger pages and/or page-specific freelist,
> I suppose.
> 
> Do you have a scripted used to start netperf in that configuration?
See the attachment.

Steps to run testing:
1) compile netperf;
2) Change PROG_DIR to path/to/netperf/src;
3) ./start_netperf_udp_v4.sh 8 #Assume your machine has 8 logical cpus.


[-- Attachment #2: start_netperf_udp_v4.sh --]
[-- Type: application/x-shellscript, Size: 1361 bytes --]

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23  8:30                                           ` Zhang, Yanmin
  2009-01-23  8:40                                             ` Pekka Enberg
@ 2009-01-23  9:46                                             ` Pekka Enberg
  2009-01-23 15:22                                               ` Christoph Lameter
  1 sibling, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-23  9:46 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, sfr, matthew.r.wilcox, chinang.ma,
	linux-kernel, sharad.c.tripathi, arjan, suresh.b.siddha,
	harita.chilukuri, douglas.w.styner, peter.xihong.wang,
	hubert.nueckel, chris.mason, srostedt, linux-scsi,
	andrew.vasquez, anirban.chakraborty, mingo

On Fri, 2009-01-23 at 16:30 +0800, Zhang, Yanmin wrote:
> On Fri, 2009-01-23 at 10:06 +0200, Pekka Enberg wrote:
> > On Fri, 2009-01-23 at 08:52 +0200, Pekka Enberg wrote:
> > > > 1) If I start CPU_NUM clients and servers, SLUB's result is about 2% better than SLQB's;
> > > > 2) If I start 1 clinet and 1 server, and bind them to different physical cpu, SLQB's result
> > > > is about 10% better than SLUB's.
> > > > 
> > > > I don't know why there is still 10% difference with item 2). Maybe cachemiss causes it?
> > > 
> > > Maybe we can use the perfstat and/or kerneltop utilities of the new perf 
> > > counters patch to diagnose this:
> > > 
> > > http://lkml.org/lkml/2009/1/21/273
> > > 
> > > And do oprofile, of course. Thanks!
> > 
> > I assume binding the client and the server to different physical CPUs
> > also  means that the SKB is always allocated on CPU 1 and freed on CPU
> > 2? If so, we will be taking the __slab_free() slow path all the time on
> > kfree() which will cause cache effects, no doubt.
> > 
> > But there's another potential performance hit we're taking because the
> > object size of the cache is so big. As allocations from CPU 1 keep
> > coming in, we need to allocate new pages and unfreeze the per-cpu page.
> > That in turn causes __slab_free() to be more eager to discard the slab
> > (see the PageSlubFrozen check there).
> > 
> > So before going for cache profiling, I'd really like to see an oprofile
> > report. I suspect we're still going to see much more page allocator
> > activity
> Theoretically, it should, but oprofile doesn't show that.

That's bit surprising, actually. FWIW, I've included a patch for empty
slab lists. But it's probably not going to help here.

> >  there than with SLAB or SLQB which is why we're still behaving
> > so badly here.
> 
> oprofile output with 2.6.29-rc2-slubrevertlarge:
> CPU: Core 2, speed 2666.71 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
> samples  %        app name                 symbol name
> 132779   32.9951  vmlinux                  copy_user_generic_string
> 25334     6.2954  vmlinux                  schedule
> 21032     5.2264  vmlinux                  tg_shares_up
> 17175     4.2679  vmlinux                  __skb_recv_datagram
> 9091      2.2591  vmlinux                  sock_def_readable
> 8934      2.2201  vmlinux                  mwait_idle
> 8796      2.1858  vmlinux                  try_to_wake_up
> 6940      1.7246  vmlinux                  __slab_free
> 
> #slaninfo -AD
> Name                   Objects    Alloc     Free   %Fast
> :0000256                  1643  5215544  5214027  94   0 
> kmalloc-8192                28  5189576  5189560   0   0 
> :0000168                  2631   141466   138976  92  28 
> :0004096                  1452    88697    87269  99  96 
> :0000192                  3402    63050    59732  89  11 
> :0000064                  6265    46611    40721  98  82 
> :0000128                  1895    30429    28654  93  32 

Looking at __slab_free(), unless page->inuse is constantly zero and we
discard the slab, it really is just cache effects (10% sounds like a
lot, though!). AFAICT, the only way to optimize that is with Christoph's
unfinished pointer freelists patches or with a remote free list like in
SLQB.

		Pekka

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 3bd3662..41a4c1a 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -48,6 +48,9 @@ struct kmem_cache_node {
 	unsigned long nr_partial;
 	unsigned long min_partial;
 	struct list_head partial;
+	unsigned long nr_empty;
+	unsigned long max_empty;
+	struct list_head empty;
 #ifdef CONFIG_SLUB_DEBUG
 	atomic_long_t nr_slabs;
 	atomic_long_t total_objects;
diff --git a/mm/slub.c b/mm/slub.c
index 8fad23f..5a12597 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -134,6 +134,11 @@
  */
 #define MAX_PARTIAL 10
 
+/*
+ * Maximum number of empty slabs.
+ */
+#define MAX_EMPTY 1
+
 #define DEBUG_DEFAULT_FLAGS (SLAB_DEBUG_FREE | SLAB_RED_ZONE | \
 				SLAB_POISON | SLAB_STORE_USER)
 
@@ -1205,6 +1210,24 @@ static void discard_slab(struct kmem_cache *s, struct page *page)
 	free_slab(s, page);
 }
 
+static void discard_or_cache_slab(struct kmem_cache *s, struct page *page)
+{
+	struct kmem_cache_node *n;
+	int node;
+
+	node = page_to_nid(page);
+	n = get_node(s, node);
+
+	dec_slabs_node(s, node, page->objects);
+
+	if (likely(n->nr_empty >= n->max_empty)) {
+		free_slab(s, page);
+	} else {
+		n->nr_empty++;
+		list_add(&page->lru, &n->partial);
+	}
+}
+
 /*
  * Per slab locking using the pagelock
  */
@@ -1252,7 +1275,7 @@ static void remove_partial(struct kmem_cache *s, struct page *page)
 }
 
 /*
- * Lock slab and remove from the partial list.
+ * Lock slab and remove from the partial or empty list.
  *
  * Must hold list_lock.
  */
@@ -1261,7 +1284,6 @@ static inline int lock_and_freeze_slab(struct kmem_cache_node *n,
 {
 	if (slab_trylock(page)) {
 		list_del(&page->lru);
-		n->nr_partial--;
 		__SetPageSlubFrozen(page);
 		return 1;
 	}
@@ -1271,7 +1293,7 @@ static inline int lock_and_freeze_slab(struct kmem_cache_node *n,
 /*
  * Try to allocate a partial slab from a specific node.
  */
-static struct page *get_partial_node(struct kmem_cache_node *n)
+static struct page *get_partial_or_empty_node(struct kmem_cache_node *n)
 {
 	struct page *page;
 
@@ -1281,13 +1303,22 @@ static struct page *get_partial_node(struct kmem_cache_node *n)
 	 * partial slab and there is none available then get_partials()
 	 * will return NULL.
 	 */
-	if (!n || !n->nr_partial)
+	if (!n || (!n->nr_partial && !n->nr_empty))
 		return NULL;
 
 	spin_lock(&n->list_lock);
+
 	list_for_each_entry(page, &n->partial, lru)
-		if (lock_and_freeze_slab(n, page))
+		if (lock_and_freeze_slab(n, page)) {
+			n->nr_partial--;
+			goto out;
+		}
+
+	list_for_each_entry(page, &n->empty, lru)
+		if (lock_and_freeze_slab(n, page)) {
+			n->nr_empty--;
 			goto out;
+		}
 	page = NULL;
 out:
 	spin_unlock(&n->list_lock);
@@ -1297,7 +1328,7 @@ out:
 /*
  * Get a page from somewhere. Search in increasing NUMA distances.
  */
-static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
+static struct page *get_any_partial_or_empty(struct kmem_cache *s, gfp_t flags)
 {
 #ifdef CONFIG_NUMA
 	struct zonelist *zonelist;
@@ -1336,7 +1367,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
 
 		if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
 				n->nr_partial > n->min_partial) {
-			page = get_partial_node(n);
+			page = get_partial_or_empty_node(n);
 			if (page)
 				return page;
 		}
@@ -1346,18 +1377,19 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
 }
 
 /*
- * Get a partial page, lock it and return it.
+ * Get a partial or empty page, lock it and return it.
  */
-static struct page *get_partial(struct kmem_cache *s, gfp_t flags, int node)
+static struct page *
+get_partial_or_empty(struct kmem_cache *s, gfp_t flags, int node)
 {
 	struct page *page;
 	int searchnode = (node == -1) ? numa_node_id() : node;
 
-	page = get_partial_node(get_node(s, searchnode));
+	page = get_partial_or_empty_node(get_node(s, searchnode));
 	if (page || (flags & __GFP_THISNODE))
 		return page;
 
-	return get_any_partial(s, flags);
+	return get_any_partial_or_empty(s, flags);
 }
 
 /*
@@ -1403,7 +1435,7 @@ static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
 		} else {
 			slab_unlock(page);
 			stat(get_cpu_slab(s, raw_smp_processor_id()), FREE_SLAB);
-			discard_slab(s, page);
+			discard_or_cache_slab(s, page);
 		}
 	}
 }
@@ -1542,7 +1574,7 @@ another_slab:
 	deactivate_slab(s, c);
 
 new_slab:
-	new = get_partial(s, gfpflags, node);
+	new = get_partial_or_empty(s, gfpflags, node);
 	if (new) {
 		c->page = new;
 		stat(c, ALLOC_FROM_PARTIAL);
@@ -1693,7 +1725,7 @@ slab_empty:
 	}
 	slab_unlock(page);
 	stat(c, FREE_SLAB);
-	discard_slab(s, page);
+	discard_or_cache_slab(s, page);
 	return;
 
 debug:
@@ -1927,6 +1959,8 @@ static void init_kmem_cache_cpu(struct kmem_cache *s,
 static void
 init_kmem_cache_node(struct kmem_cache_node *n, struct kmem_cache *s)
 {
+	spin_lock_init(&n->list_lock);
+
 	n->nr_partial = 0;
 
 	/*
@@ -1939,8 +1973,18 @@ init_kmem_cache_node(struct kmem_cache_node *n, struct kmem_cache *s)
 	else if (n->min_partial > MAX_PARTIAL)
 		n->min_partial = MAX_PARTIAL;
 
-	spin_lock_init(&n->list_lock);
 	INIT_LIST_HEAD(&n->partial);
+
+	n->nr_empty = 0;
+	/*
+	 * XXX: This needs to take object size into account. We don't need
+	 * empty slabs for caches which will have plenty of partial slabs
+	 * available. Only caches that have either full or empty slabs need
+	 * this kind of optimization.
+	 */
+	n->max_empty = MAX_EMPTY;
+	INIT_LIST_HEAD(&n->empty);
+
 #ifdef CONFIG_SLUB_DEBUG
 	atomic_long_set(&n->nr_slabs, 0);
 	atomic_long_set(&n->total_objects, 0);
@@ -2427,6 +2471,32 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n)
 	spin_unlock_irqrestore(&n->list_lock, flags);
 }
 
+static void free_empty_slabs(struct kmem_cache *s)
+{
+	int node;
+
+	for_each_node_state(node, N_NORMAL_MEMORY) {
+		struct kmem_cache_node *n;
+		struct page *page, *t;
+		unsigned long flags;
+
+		n = get_node(s, node);
+
+		if (!n->nr_empty)
+			continue;
+
+		spin_lock_irqsave(&n->list_lock, flags);
+
+		list_for_each_entry_safe(page, t, &n->empty, lru) {
+			list_del(&page->lru);
+			n->nr_empty--;
+
+			free_slab(s, page);
+		}
+		spin_unlock_irqrestore(&n->list_lock, flags);
+	}
+}
+
 /*
  * Release all resources used by a slab cache.
  */
@@ -2436,6 +2506,8 @@ static inline int kmem_cache_close(struct kmem_cache *s)
 
 	flush_all(s);
 
+	free_empty_slabs(s);
+
 	/* Attempt to free all objects */
 	free_kmem_cache_cpus(s);
 	for_each_node_state(node, N_NORMAL_MEMORY) {
@@ -2765,6 +2837,7 @@ int kmem_cache_shrink(struct kmem_cache *s)
 		return -ENOMEM;
 
 	flush_all(s);
+	free_empty_slabs(s);
 	for_each_node_state(node, N_NORMAL_MEMORY) {
 		n = get_node(s, node);
 



^ permalink raw reply related	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23  9:46                                             ` Pekka Enberg
@ 2009-01-23 15:22                                               ` Christoph Lameter
  2009-01-23 15:31                                                 ` Pekka Enberg
  2009-01-24  2:55                                                 ` Zhang, Yanmin
  0 siblings, 2 replies; 122+ messages in thread
From: Christoph Lameter @ 2009-01-23 15:22 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Zhang, Yanmin, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Fri, 23 Jan 2009, Pekka Enberg wrote:

> Looking at __slab_free(), unless page->inuse is constantly zero and we
> discard the slab, it really is just cache effects (10% sounds like a
> lot, though!). AFAICT, the only way to optimize that is with Christoph's
> unfinished pointer freelists patches or with a remote free list like in
> SLQB.

No there is another way. Increase the allocator order to 3 for the
kmalloc-8192 slab then multiple 8k blocks can be allocated from one of the
larger chunks of data gotten from the page allocator. That will allow slub
to do fast allocs.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23 15:22                                               ` Christoph Lameter
@ 2009-01-23 15:31                                                 ` Pekka Enberg
  2009-01-23 15:55                                                   ` Christoph Lameter
  2009-01-24  2:55                                                 ` Zhang, Yanmin
  1 sibling, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-23 15:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Zhang, Yanmin, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Fri, 2009-01-23 at 10:22 -0500, Christoph Lameter wrote:
> On Fri, 23 Jan 2009, Pekka Enberg wrote:
> 
> > Looking at __slab_free(), unless page->inuse is constantly zero and we
> > discard the slab, it really is just cache effects (10% sounds like a
> > lot, though!). AFAICT, the only way to optimize that is with Christoph's
> > unfinished pointer freelists patches or with a remote free list like in
> > SLQB.
> 
> No there is another way. Increase the allocator order to 3 for the
> kmalloc-8192 slab then multiple 8k blocks can be allocated from one of the
> larger chunks of data gotten from the page allocator. That will allow slub
> to do fast allocs.

I wonder why that doesn't happen already, actually. The slub_max_order
know is capped to PAGE_ALLOC_COSTLY_ORDER ("3") by default and obviously
order 3 should be as good fit as order 2 so 'fraction' can't be too high
either. Hmm.

		Pekka


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23 15:31                                                 ` Pekka Enberg
@ 2009-01-23 15:55                                                   ` Christoph Lameter
  2009-01-23 16:01                                                     ` Pekka Enberg
  0 siblings, 1 reply; 122+ messages in thread
From: Christoph Lameter @ 2009-01-23 15:55 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Zhang, Yanmin, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Fri, 23 Jan 2009, Pekka Enberg wrote:

> I wonder why that doesn't happen already, actually. The slub_max_order
> know is capped to PAGE_ALLOC_COSTLY_ORDER ("3") by default and obviously
> order 3 should be as good fit as order 2 so 'fraction' can't be too high
> either. Hmm.

The kmalloc-8192 is new. Look at slabinfo output to see what allocation
orders are chosen.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23 15:55                                                   ` Christoph Lameter
@ 2009-01-23 16:01                                                     ` Pekka Enberg
  0 siblings, 0 replies; 122+ messages in thread
From: Pekka Enberg @ 2009-01-23 16:01 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Zhang, Yanmin, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Fri, 23 Jan 2009, Pekka Enberg wrote:
> > I wonder why that doesn't happen already, actually. The slub_max_order
> > know is capped to PAGE_ALLOC_COSTLY_ORDER ("3") by default and obviously
> > order 3 should be as good fit as order 2 so 'fraction' can't be too high
> > either. Hmm.

On Fri, 2009-01-23 at 10:55 -0500, Christoph Lameter wrote:
> The kmalloc-8192 is new. Look at slabinfo output to see what allocation
> orders are chosen.

Yes, yes, I know the new cache a result of my patch. I'm just saying
that AFAICT, the existing logic should set the order to 3 but IIRC
Yanmin said it's 2.

			Pekka


^ permalink raw reply	[flat|nested] 122+ messages in thread

* care and feeding of netperf (Re: Mainline kernel OLTP performance update)
  2009-01-23  9:02                                         ` Zhang, Yanmin
@ 2009-01-23 18:40                                           ` Rick Jones
  2009-01-23 18:51                                             ` Grant Grundler
  2009-01-24  3:03                                             ` Zhang, Yanmin
  0 siblings, 2 replies; 122+ messages in thread
From: Rick Jones @ 2009-01-23 18:40 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, Andi Kleen,
	Matthew Wilcox, Andrew Morton, netdev, sfr, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

> 3) ./start_netperf_udp_v4.sh 8 #Assume your machine has 8 logical cpus.

Some comments on the script:

> #!/bin/sh
> 
> PROG_DIR=/home/ymzhang/test/netperf/src
> date=`date +%H%M%N`
> #PROG_DIR=/root/netperf/netperf/src
> client_num=$1
> pin_cpu=$2
> 
> start_port_server=12384
> start_port_client=15888
> 
> killall netserver
> ${PROG_DIR}/netserver
> sleep 2

Any particular reason for killing-off the netserver daemon?

> if [ ! -d result ]; then
>         mkdir result
> fi
> 
> all_result_files=""
> for i in `seq 1 ${client_num}`; do
>         if [ "${pin_cpu}" == "pin" ]; then
>                 pin_param="-T ${i} ${i}"

The -T option takes arguments of the form:

N   - bind both netperf and netserver to core N
N,  - bind only netperf to core N, float netserver
  ,M - float netperf, bind only netserver to core M
N,M - bind netperf to core N and netserver to core M

Without a comma between N and M knuth only knows what the command line parser 
will do :)

>         fi
>         result_file=result/netperf_${start_port_client}.${date}
>         #./netperf -t UDP_STREAM -l 60 -H 127.0.0.1 -- -P 15895 12391 -s 32768 -S 32768 -m 4096
>         #./netperf -t UDP_STREAM -l 60 -H 127.0.0.1 -i 50 3 -I 99 5 -- -P 12384 12888 -s 32768 -S 32768 -m 4096
>         #${PROG_DIR}/netperf -p ${port_num} -t TCP_RR -l 60 -H 127.0.0.1 ${pin_param} -- -r 1,1 >${result_file} &
>         ${PROG_DIR}/netperf -t UDP_STREAM -l 60 -H 127.0.0.1 ${pin_param} -- -P ${start_port_client} ${start_port_server} -s 32768 -S 32768 -m 4096 >${result_file}  &

Same thing here for the -P option - there needs to be a comma between the two 
port numbers otherwise, the best case is that the second port number is ignored. 
  Worst case is that netperf starts doing knuth only knows what.


To get quick profiles, that form of aggregate netperf is OK - just the one 
iteration with background processes using a moderatly long run time.  However, 
for result reporting, it is best to (ab)use the confidence intervals 
functionality to try to avoid skew errors.  I tend to add-in a global -i 30 
option to get each netperf to repeat its measurments 30 times.  That way one is 
reasonably confident that skew issues are minimized.

http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance

And I would probably add the -c and -C options to have netperf report service 
demands.


>         sub_pid="${sub_pid} `echo $!`"
>         port_num=$((${port_num}+1))
>         all_result_files="${all_result_files} ${result_file}"
>         start_port_server=$((${start_port_server}+1))
>         start_port_client=$((${start_port_client}+1))
> done;
> 
> wait ${sub_pid}
> killall netserver
> 
> result="0"
> for i in `echo ${all_result_files}`; do
>         sub_result=`awk '/Throughput/ {getline; getline; getline; print " "$6}' ${i}`
>         result=`echo "${result}+${sub_result}"|bc`
> done;

The documented-only-in-source :( "omni" tests in top-of-trunk netperf:

http://www.netperf.org/svn/netperf2/trunk

./configure --enable-omni

allow one to specify which result values one wants, in which order, either as 
more or less traditional netperf output (test-specific -O), CSV (test-specific 
-o) or keyval (test-specific -k).  All three take an optional filename as an 
argument with the file containing a list of desired output values.  You can give 
a "filename" of '?' to get the list of output values known to that version of 
netperf.

Might help simplify parsing and whatnot.

happy benchmarking,

rick jones

> 
> echo $result

> 


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: care and feeding of netperf (Re: Mainline kernel OLTP performance  update)
  2009-01-23 18:40                                           ` care and feeding of netperf (Re: Mainline kernel OLTP performance update) Rick Jones
@ 2009-01-23 18:51                                             ` Grant Grundler
  2009-01-24  3:03                                             ` Zhang, Yanmin
  1 sibling, 0 replies; 122+ messages in thread
From: Grant Grundler @ 2009-01-23 18:51 UTC (permalink / raw)
  To: Rick Jones
  Cc: Zhang, Yanmin, Nick Piggin, Pekka Enberg, Christoph Lameter,
	Andi Kleen, Matthew Wilcox, Andrew Morton, netdev, sfr,
	matthew.r.wilcox, chinang.ma, linux-kernel, sharad.c.tripathi,
	arjan, suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

On Fri, Jan 23, 2009 at 10:40 AM, Rick Jones <rick.jones2@hp.com> wrote:
...
> And I would probably add the -c and -C options to have netperf report
> service demands.

For performance analysis, the service demand is often more interesting
than the absolute performance (which typically only varies a few Mb/s
for gigE NICs). I strongly encourage adding -c and -C.

grant

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-23 15:22                                               ` Christoph Lameter
  2009-01-23 15:31                                                 ` Pekka Enberg
@ 2009-01-24  2:55                                                 ` Zhang, Yanmin
  2009-01-24  7:36                                                   ` Pekka Enberg
  2009-01-26 17:36                                                   ` Christoph Lameter
  1 sibling, 2 replies; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-24  2:55 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Fri, 2009-01-23 at 10:22 -0500, Christoph Lameter wrote:
> On Fri, 23 Jan 2009, Pekka Enberg wrote:
> 
> > Looking at __slab_free(), unless page->inuse is constantly zero and we
> > discard the slab, it really is just cache effects (10% sounds like a
> > lot, though!). AFAICT, the only way to optimize that is with Christoph's
> > unfinished pointer freelists patches or with a remote free list like in
> > SLQB.
> 
> No there is another way. Increase the allocator order to 3 for the
> kmalloc-8192 slab then multiple 8k blocks can be allocated from one of the
> larger chunks of data gotten from the page allocator. That will allow slub
> to do fast allocs.
After I change kmalloc-8192/order to 3, the result(pinned netperf UDP-U-4k)
difference between SLUB and SLQB becomes 1% which can be considered as fluctuation.

But when trying to increased it to 4, I got:
[root@lkp-st02-x8664 slab]# echo "3">kmalloc-8192/order
[root@lkp-st02-x8664 slab]# echo "4">kmalloc-8192/order
-bash: echo: write error: Invalid argument

Comparing with SLQB, it seems SLUB needs too many investigation/manual finer-tuning
against specific benchmarks. One hard is to tune page order number. Although SLQB also
has many tuning options, I almost doesn't tune it manually, just run benchmark and
collect results to compare. Does that mean the scalability of SLQB is better?



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: care and feeding of netperf (Re: Mainline kernel OLTP performance update)
  2009-01-23 18:40                                           ` care and feeding of netperf (Re: Mainline kernel OLTP performance update) Rick Jones
  2009-01-23 18:51                                             ` Grant Grundler
@ 2009-01-24  3:03                                             ` Zhang, Yanmin
  2009-01-26 18:26                                               ` Rick Jones
  1 sibling, 1 reply; 122+ messages in thread
From: Zhang, Yanmin @ 2009-01-24  3:03 UTC (permalink / raw)
  To: Rick Jones
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, Andi Kleen,
	Matthew Wilcox, Andrew Morton, netdev, sfr, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

On Fri, 2009-01-23 at 10:40 -0800, Rick Jones wrote:
> > 3) ./start_netperf_udp_v4.sh 8 #Assume your machine has 8 logical cpus.
> 
> Some comments on the script:
Thanks. I wanted to run the testing to get result quickly as long as
the result has no big fluctuation.

> 
> > #!/bin/sh
> > 
> > PROG_DIR=/home/ymzhang/test/netperf/src
> > date=`date +%H%M%N`
> > #PROG_DIR=/root/netperf/netperf/src
> > client_num=$1
> > pin_cpu=$2
> > 
> > start_port_server=12384
> > start_port_client=15888
> > 
> > killall netserver
> > ${PROG_DIR}/netserver
> > sleep 2
> 
> Any particular reason for killing-off the netserver daemon?
I'm not sure if prior running might leave any impact on later running, so
just kill netserver.

> 
> > if [ ! -d result ]; then
> >         mkdir result
> > fi
> > 
> > all_result_files=""
> > for i in `seq 1 ${client_num}`; do
> >         if [ "${pin_cpu}" == "pin" ]; then
> >                 pin_param="-T ${i} ${i}"
> 
> The -T option takes arguments of the form:
> 
> N   - bind both netperf and netserver to core N
> N,  - bind only netperf to core N, float netserver
>   ,M - float netperf, bind only netserver to core M
> N,M - bind netperf to core N and netserver to core M
> 
> Without a comma between N and M knuth only knows what the command line parser 
> will do :)
> 
> >         fi
> >         result_file=result/netperf_${start_port_client}.${date}
> >         #./netperf -t UDP_STREAM -l 60 -H 127.0.0.1 -- -P 15895 12391 -s 32768 -S 32768 -m 4096
> >         #./netperf -t UDP_STREAM -l 60 -H 127.0.0.1 -i 50 3 -I 99 5 -- -P 12384 12888 -s 32768 -S 32768 -m 4096
> >         #${PROG_DIR}/netperf -p ${port_num} -t TCP_RR -l 60 -H 127.0.0.1 ${pin_param} -- -r 1,1 >${result_file} &
> >         ${PROG_DIR}/netperf -t UDP_STREAM -l 60 -H 127.0.0.1 ${pin_param} -- -P ${start_port_client} ${start_port_server} -s 32768 -S 32768 -m 4096 >${result_file}  &
> 
> Same thing here for the -P option - there needs to be a comma between the two 
> port numbers otherwise, the best case is that the second port number is ignored. 
>   Worst case is that netperf starts doing knuth only knows what.
Thanks.

> 
> 
> To get quick profiles, that form of aggregate netperf is OK - just the one 
> iteration with background processes using a moderatly long run time.  However, 
> for result reporting, it is best to (ab)use the confidence intervals 
> functionality to try to avoid skew errors.
Yes. My formal testing uses -i 50. I just wanted a quick testing. If I need
finer-tuning or investigation, I would turn on more options.

>   I tend to add-in a global -i 30 
> option to get each netperf to repeat its measurments 30 times.  That way one is 
> reasonably confident that skew issues are minimized.
> 
> http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance
> 
> And I would probably add the -c and -C options to have netperf report service 
> demands.
Yes. That's good. I'm used to start vmstat or mpstat to monitor cpu utilization
in real time.

> 
> 
> >         sub_pid="${sub_pid} `echo $!`"
> >         port_num=$((${port_num}+1))
> >         all_result_files="${all_result_files} ${result_file}"
> >         start_port_server=$((${start_port_server}+1))
> >         start_port_client=$((${start_port_client}+1))
> > done;
> > 
> > wait ${sub_pid}
> > killall netserver
> > 
> > result="0"
> > for i in `echo ${all_result_files}`; do
> >         sub_result=`awk '/Throughput/ {getline; getline; getline; print " "$6}' ${i}`
> >         result=`echo "${result}+${sub_result}"|bc`
> > done;
> 
> The documented-only-in-source :( "omni" tests in top-of-trunk netperf:
> 
> http://www.netperf.org/svn/netperf2/trunk
> 
> ./configure --enable-omni
> 
> allow one to specify which result values one wants, in which order, either as 
> more or less traditional netperf output (test-specific -O), CSV (test-specific 
> -o) or keyval (test-specific -k).  All three take an optional filename as an 
> argument with the file containing a list of desired output values.  You can give 
> a "filename" of '?' to get the list of output values known to that version of 
> netperf.
> 
> Might help simplify parsing and whatnot.
Yes, it does.

> 
> happy benchmarking,
> 
> rick jones
Thanks again. I learned a lot.

> 
> > 
> > echo $result
> 
> > 


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-24  2:55                                                 ` Zhang, Yanmin
@ 2009-01-24  7:36                                                   ` Pekka Enberg
  2009-02-12  5:22                                                     ` Zhang, Yanmin
  2009-01-26 17:36                                                   ` Christoph Lameter
  1 sibling, 1 reply; 122+ messages in thread
From: Pekka Enberg @ 2009-01-24  7:36 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Fri, 2009-01-23 at 10:22 -0500, Christoph Lameter wrote:
>> No there is another way. Increase the allocator order to 3 for the
>> kmalloc-8192 slab then multiple 8k blocks can be allocated from one of the
>> larger chunks of data gotten from the page allocator. That will allow slub
>> to do fast allocs.

On Sat, Jan 24, 2009 at 4:55 AM, Zhang, Yanmin
<yanmin_zhang@linux.intel.com> wrote:
> After I change kmalloc-8192/order to 3, the result(pinned netperf UDP-U-4k)
> difference between SLUB and SLQB becomes 1% which can be considered as fluctuation.

Great. We should fix calculate_order() to be order 3 for kmalloc-8192.
Are you interested in doing that?

On Sat, Jan 24, 2009 at 4:55 AM, Zhang, Yanmin
<yanmin_zhang@linux.intel.com> wrote:
> But when trying to increased it to 4, I got:
> [root@lkp-st02-x8664 slab]# echo "3">kmalloc-8192/order
> [root@lkp-st02-x8664 slab]# echo "4">kmalloc-8192/order
> -bash: echo: write error: Invalid argument

That's probably because max order is capped to 3. You can change that
by passing slub_max_order=<n> as kernel parameter.

On Sat, Jan 24, 2009 at 4:55 AM, Zhang, Yanmin
<yanmin_zhang@linux.intel.com> wrote:
> Comparing with SLQB, it seems SLUB needs too many investigation/manual finer-tuning
> against specific benchmarks. One hard is to tune page order number. Although SLQB also
> has many tuning options, I almost doesn't tune it manually, just run benchmark and
> collect results to compare. Does that mean the scalability of SLQB is better?

One thing is sure, SLUB seems to be hard to tune. Probably because
it's dependent on the page order so much.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-24  2:55                                                 ` Zhang, Yanmin
  2009-01-24  7:36                                                   ` Pekka Enberg
@ 2009-01-26 17:36                                                   ` Christoph Lameter
  2009-02-01  2:52                                                     ` Zhang, Yanmin
  1 sibling, 1 reply; 122+ messages in thread
From: Christoph Lameter @ 2009-01-26 17:36 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Pekka Enberg, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Sat, 24 Jan 2009, Zhang, Yanmin wrote:

> But when trying to increased it to 4, I got:
> [root@lkp-st02-x8664 slab]# echo "3">kmalloc-8192/order
> [root@lkp-st02-x8664 slab]# echo "4">kmalloc-8192/order
> -bash: echo: write error: Invalid argument

This is because 4 is more than the maximum allowed order. You can
reconfigure that by setting

slub_max_order=5

or so on boot.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: care and feeding of netperf (Re: Mainline kernel OLTP performance update)
  2009-01-24  3:03                                             ` Zhang, Yanmin
@ 2009-01-26 18:26                                               ` Rick Jones
  0 siblings, 0 replies; 122+ messages in thread
From: Rick Jones @ 2009-01-26 18:26 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, Andi Kleen,
	Matthew Wilcox, Andrew Morton, netdev, sfr, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty

>>To get quick profiles, that form of aggregate netperf is OK - just the one 
>>iteration with background processes using a moderatly long run time.  However, 
>>for result reporting, it is best to (ab)use the confidence intervals 
>>functionality to try to avoid skew errors.
> 
> Yes. My formal testing uses -i 50. I just wanted a quick testing. If I need
> finer-tuning or investigation, I would turn on more options.

Netperf will silently clip that to 30 as that is all the built-in tables know.

> Thanks again. I learned a lot.

Feel free to wander over to netperf-talk over at netperf.org if you want to talk 
some more about the care and feeding of netperf.

happy benchmarking,

rick jones

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
       [not found]                   ` <588992150B702C48B3312184F1B810AD03A4F59632@azsmsx501.amr.corp.intel.com>
@ 2009-01-27  8:28                     ` Jens Axboe
  0 siblings, 0 replies; 122+ messages in thread
From: Jens Axboe @ 2009-01-27  8:28 UTC (permalink / raw)
  To: Chilukuri, Harita
  Cc: Matthew Wilcox, Andi Kleen, Andrew Morton, Wilcox, Matthew R, Ma,
	Chinang, linux-kernel, Tripathi, Sharad C, arjan, Siddha,
	Suresh B, Styner, Douglas W, Wang, Peter Xihong, Nueckel, Hubert,
	chris.mason, srostedt, linux-scsi, Andrew Vasquez,
	Anirban Chakraborty

On Mon, Jan 26 2009, Chilukuri, Harita wrote:
> Jens, we did test the patch that disables the whole stats. We get 0.5% gain with this patch on 2.6.29-rc2 comparing to 2.6.29-rc2-disbale_part_stats
> 
> Below is the description of the result:
> 
> Linux OLTP Performance summary
> Kernel#                               Speedup(x) Intr/s  CtxSw/s  us%    sys%    idle% iowait%
> 2.6.29-rc2-disbale_partition_stats     1.000    30413   42582    74      25      0       0
> 2.6.29-rc2-disable_all                 1.005    30401   42656    74      25      0       0
> 
> Server configurations:
> Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)

OK, so about the same, which means the lookup is likely the expensive
bit. I have merged this patch:

http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=e5b74b703da41fab060adc335a0b98fa5a5ea61d

which exposes an 'iostats' toggle that allows users to disable disk
statistics completely.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-26 17:36                                                   ` Christoph Lameter
@ 2009-02-01  2:52                                                     ` Zhang, Yanmin
  0 siblings, 0 replies; 122+ messages in thread
From: Zhang, Yanmin @ 2009-02-01  2:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Mon, 2009-01-26 at 12:36 -0500, Christoph Lameter wrote:
> On Sat, 24 Jan 2009, Zhang, Yanmin wrote:
> 
> > But when trying to increased it to 4, I got:
> > [root@lkp-st02-x8664 slab]# echo "3">kmalloc-8192/order
> > [root@lkp-st02-x8664 slab]# echo "4">kmalloc-8192/order
> > -bash: echo: write error: Invalid argument
> 
> This is because 4 is more than the maximum allowed order. You can
> reconfigure that by setting
> 
> slub_max_order=5
> 
> or so on boot.
With slub_max_order=5, the default order of kmalloc-8192 becomes
5. I tested it with netperf UDP-U-4k and the result difference from
SLAB/SLQB is less than 1% which is really fluctuation.



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-01-24  7:36                                                   ` Pekka Enberg
@ 2009-02-12  5:22                                                     ` Zhang, Yanmin
  2009-02-12  5:47                                                       ` Zhang, Yanmin
  0 siblings, 1 reply; 122+ messages in thread
From: Zhang, Yanmin @ 2009-02-12  5:22 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Sat, 2009-01-24 at 09:36 +0200, Pekka Enberg wrote:
> On Fri, 2009-01-23 at 10:22 -0500, Christoph Lameter wrote:
> >> No there is another way. Increase the allocator order to 3 for the
> >> kmalloc-8192 slab then multiple 8k blocks can be allocated from one of the
> >> larger chunks of data gotten from the page allocator. That will allow slub
> >> to do fast allocs.
> 
> On Sat, Jan 24, 2009 at 4:55 AM, Zhang, Yanmin
> <yanmin_zhang@linux.intel.com> wrote:
> > After I change kmalloc-8192/order to 3, the result(pinned netperf UDP-U-4k)
> > difference between SLUB and SLQB becomes 1% which can be considered as fluctuation.
> 
> Great. We should fix calculate_order() to be order 3 for kmalloc-8192.
> Are you interested in doing that?
Pekka,

Sorry for the late update.
The default order of kmalloc-8192 on 2*4 stoakley is really an issue of calculate_order.


slab_size	order		name
-------------------------------------------------
4096            3               sgpool-128
8192            2               kmalloc-8192
16384           3               kmalloc-16384

kmalloc-8192's default order is smaller than sgpool-128's.

On 4*4 tigerton machine, a similiar issue appears on another kmem_cache.

Function calculate_order uses 'min_objects /= 2;' to shrink. Plus size calculation/checking
in slab_order, sometimes above issue appear.

Below patch against 2.6.29-rc2 fixes it.

I checked the default orders of all kmem_cache and they don't become smaller than before. So
the patch wouldn't hurt performance.

Signed-off-by Zhang Yanmin <yanmin.zhang@linux.intel.com>

---

diff -Nraup linux-2.6.29-rc2/mm/slub.c linux-2.6.29-rc2_slubcalc_order/mm/slub.c
--- linux-2.6.29-rc2/mm/slub.c	2009-02-11 00:49:48.000000000 -0500
+++ linux-2.6.29-rc2_slubcalc_order/mm/slub.c	2009-02-12 00:08:24.000000000 -0500
@@ -1856,6 +1856,7 @@ static inline int calculate_order(int si
 	min_objects = slub_min_objects;
 	if (!min_objects)
 		min_objects = 4 * (fls(nr_cpu_ids) + 1);
+	min_objects = min(min_objects, (PAGE_SIZE << slub_max_order)/size);
 	while (min_objects > 1) {
 		fraction = 16;
 		while (fraction >= 4) {
@@ -1865,7 +1866,7 @@ static inline int calculate_order(int si
 				return order;
 			fraction /= 2;
 		}
-		min_objects /= 2;
+		min_objects --;
 	}
 
 	/*



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-02-12  5:22                                                     ` Zhang, Yanmin
@ 2009-02-12  5:47                                                       ` Zhang, Yanmin
  2009-02-12 15:25                                                         ` Christoph Lameter
  2009-02-12 16:03                                                         ` Pekka Enberg
  0 siblings, 2 replies; 122+ messages in thread
From: Zhang, Yanmin @ 2009-02-12  5:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Thu, 2009-02-12 at 13:22 +0800, Zhang, Yanmin wrote:
> On Sat, 2009-01-24 at 09:36 +0200, Pekka Enberg wrote:
> > On Fri, 2009-01-23 at 10:22 -0500, Christoph Lameter wrote:
> > >> No there is another way. Increase the allocator order to 3 for the
> > >> kmalloc-8192 slab then multiple 8k blocks can be allocated from one of the
> > >> larger chunks of data gotten from the page allocator. That will allow slub
> > >> to do fast allocs.
> > 
> > On Sat, Jan 24, 2009 at 4:55 AM, Zhang, Yanmin
> > <yanmin_zhang@linux.intel.com> wrote:
> > > After I change kmalloc-8192/order to 3, the result(pinned netperf UDP-U-4k)
> > > difference between SLUB and SLQB becomes 1% which can be considered as fluctuation.
> > 
> > Great. We should fix calculate_order() to be order 3 for kmalloc-8192.
> > Are you interested in doing that?
> Pekka,
> 
> Sorry for the late update.
> The default order of kmalloc-8192 on 2*4 stoakley is really an issue of calculate_order.
Oh, previous patch has a compiling warning. Pls. use below patch.

From: Zhang Yanmin <yanmin.zhang@linux.intel.com>

The default order of kmalloc-8192 on 2*4 stoakley is an issue of calculate_order.


slab_size       order           name
-------------------------------------------------
4096            3               sgpool-128
8192            2               kmalloc-8192
16384           3               kmalloc-16384

kmalloc-8192's default order is smaller than sgpool-128's.

On 4*4 tigerton machine, a similiar issue appears on another kmem_cache.

Function calculate_order uses 'min_objects /= 2;' to shrink. Plus size calculation/checking
in slab_order, sometimes above issue appear.

Below patch against 2.6.29-rc2 fixes it.

I checked the default orders of all kmem_cache and they don't become smaller than before. So
the patch wouldn't hurt performance.

Signed-off-by Zhang Yanmin <yanmin.zhang@linux.intel.com>

---

--- linux-2.6.29-rc2/mm/slub.c	2009-02-11 00:49:48.000000000 -0500
+++ linux-2.6.29-rc2_slubcalc_order/mm/slub.c	2009-02-12 00:47:52.000000000 -0500
@@ -1844,6 +1844,7 @@ static inline int calculate_order(int si
 	int order;
 	int min_objects;
 	int fraction;
+	int max_objects;
 
 	/*
 	 * Attempt to find best configuration for a slab. This
@@ -1856,6 +1857,9 @@ static inline int calculate_order(int si
 	min_objects = slub_min_objects;
 	if (!min_objects)
 		min_objects = 4 * (fls(nr_cpu_ids) + 1);
+	max_objects = (PAGE_SIZE << slub_max_order)/size;
+	min_objects = min(min_objects, max_objects);
+
 	while (min_objects > 1) {
 		fraction = 16;
 		while (fraction >= 4) {
@@ -1865,7 +1869,7 @@ static inline int calculate_order(int si
 				return order;
 			fraction /= 2;
 		}
-		min_objects /= 2;
+		min_objects --;
 	}
 
 	/*



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-02-12  5:47                                                       ` Zhang, Yanmin
@ 2009-02-12 15:25                                                         ` Christoph Lameter
  2009-02-12 16:07                                                           ` Pekka Enberg
  2009-02-12 16:03                                                         ` Pekka Enberg
  1 sibling, 1 reply; 122+ messages in thread
From: Christoph Lameter @ 2009-02-12 15:25 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Pekka Enberg, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

[-- Attachment #1: Type: TEXT/PLAIN, Size: 679 bytes --]

On Thu, 12 Feb 2009, Zhang, Yanmin wrote:

> The default order of kmalloc-8192 on 2*4 stoakley is an issue of calculate_order.
>
>
> slab_size       order           name
> -------------------------------------------------
> 4096            3               sgpool-128
> 8192            2               kmalloc-8192
> 16384           3               kmalloc-16384
>
> kmalloc-8192's default order is smaller than sgpool-128's.

You reverted the page allocator passthrough patch before this right?
Otherwise kmalloc-8192 should not exist and allocation calls for 8192
bytes would be converted inline to request of an order 1 page from the
page allocator.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-02-12  5:47                                                       ` Zhang, Yanmin
  2009-02-12 15:25                                                         ` Christoph Lameter
@ 2009-02-12 16:03                                                         ` Pekka Enberg
  1 sibling, 0 replies; 122+ messages in thread
From: Pekka Enberg @ 2009-02-12 16:03 UTC (permalink / raw)
  To: Zhang, Yanmin
  Cc: Christoph Lameter, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

On Sat, 2009-01-24 at 09:36 +0200, Pekka Enberg wrote:
> > > On Fri, 2009-01-23 at 10:22 -0500, Christoph Lameter wrote:
> > > >> No there is another way. Increase the allocator order to 3 for the
> > > >> kmalloc-8192 slab then multiple 8k blocks can be allocated from one of the
> > > >> larger chunks of data gotten from the page allocator. That will allow slub
> > > >> to do fast allocs.
> > > 
> > > On Sat, Jan 24, 2009 at 4:55 AM, Zhang, Yanmin
> > > <yanmin_zhang@linux.intel.com> wrote:
> > > > After I change kmalloc-8192/order to 3, the result(pinned netperf UDP-U-4k)
> > > > difference between SLUB and SLQB becomes 1% which can be considered as fluctuation.
> > > 
> > > Great. We should fix calculate_order() to be order 3 for kmalloc-8192.
> > > Are you interested in doing that?

On Thu, 2009-02-12 at 13:22 +0800, Zhang, Yanmin wrote:
> > Pekka,
> > 
> > Sorry for the late update.
> > The default order of kmalloc-8192 on 2*4 stoakley is really an issue of calculate_order.

On Thu, 2009-02-12 at 13:47 +0800, Zhang, Yanmin wrote:
> Oh, previous patch has a compiling warning. Pls. use below patch.
> 
> From: Zhang Yanmin <yanmin.zhang@linux.intel.com>
> 
> The default order of kmalloc-8192 on 2*4 stoakley is an issue of calculate_order.

Applied to the 'topic/slub/perf' branch. Thanks!

			Pekka


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-02-12 15:25                                                         ` Christoph Lameter
@ 2009-02-12 16:07                                                           ` Pekka Enberg
  0 siblings, 0 replies; 122+ messages in thread
From: Pekka Enberg @ 2009-02-12 16:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Zhang, Yanmin, Andi Kleen, Matthew Wilcox, Nick Piggin,
	Andrew Morton, netdev, Stephen Rothwell, matthew.r.wilcox,
	chinang.ma, linux-kernel, sharad.c.tripathi, arjan,
	suresh.b.siddha, harita.chilukuri, douglas.w.styner,
	peter.xihong.wang, hubert.nueckel, chris.mason, srostedt,
	linux-scsi, andrew.vasquez, anirban.chakraborty, Ingo Molnar

Hi Christoph,

On Thu, 12 Feb 2009, Zhang, Yanmin wrote:
>> The default order of kmalloc-8192 on 2*4 stoakley is an issue of calculate_order.
>>
>>
>> slab_size       order           name
>> -------------------------------------------------
>> 4096            3               sgpool-128
>> 8192            2               kmalloc-8192
>> 16384           3               kmalloc-16384
>>
>> kmalloc-8192's default order is smaller than sgpool-128's.

On Thu, Feb 12, 2009 at 5:25 PM, Christoph Lameter
<cl@linux-foundation.org> wrote:
> You reverted the page allocator passthrough patch before this right?
> Otherwise kmalloc-8192 should not exist and allocation calls for 8192
> bytes would be converted inline to request of an order 1 page from the
> page allocator.

Yup, I assume that's the case here.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Mainline kernel OLTP performance update
@ 2010-01-25 18:26 Ma, Chinang
  0 siblings, 0 replies; 122+ messages in thread
From: Ma, Chinang @ 2010-01-25 18:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: arjan, Wilcox, Matthew R, Chris Mason, Kleen, Andi, Garg, Anil K,
	Prickett, Terry O

Here is an OLTP performance summary comparing 2.6.33-rc4 to Red Hat EL5.4 release. Both kernels were compiled using the same EL5.4 .config to minimize configuration differences.

Comparing to RHEL 5.4 baseline, 2.6.33-rc4 kernel has around 0.8% OLTP performance regression.

Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%  idle%  iowait%
2.6.18-164.el5(RHEL5.4) 1.000   144080  181307  68      28      1       4
2.6.32.1                0.983   248305  174940  67      32      0       1
2.6.33-rc4              0.992   221354  180750  68      30      0       2

Hardware configuration
NHM-EP 2.93GHz 2 sockets/8 cores/16 threads
72GB memory  
4x LSI 3801SAS + 2x QLA2300, 192 SSDs + 28 spindles log

======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.18-164.el5             Cycles% 2.6.33-rc4
70.5642 <dbms>        		     69.4350 <dbms>
1.4696 mpt_interrupt               0.9540 mpt_interrupt
0.9001 kmem_cache_free             0.7649 scsi_request_fn
0.7807 schedule                    0.6990 __blockdev_direct_IO
0.7769 __blockdev_direct_IO        0.6729 schedule
0.6053 scsi_request_fn             0.6721 kmem_cache_alloc
0.5003 kmem_cache_alloc            0.6245 kmem_cache_free
0.4355 kmem_cache_zalloc           0.4556 pick_next_highest_task_rt
0.4090 list_del                    0.3538 __switch_to
0.3570 gup_huge_pmd                0.3397 memmove
0.3452 __switch_to                 0.3339 rb_get_reader_page
0.3399 kfree                       0.3318 try_to_wake_up
0.3371 task_rq_lock                0.3309 sd_prep_fn
0.3173 __sigsetjmp                 0.3219 list_del
0.3153 memmove                     0.3217 ring_buffer_consume
0.2869 lock_timer_base             0.3100 kfree
0.2851 generic_make_request        0.3085 __sigsetjmp
0.2613 scsi_get_command            0.3073 mptscsih_qcmd
0.2599 __generic_file_aio_read     0.2810 scsi_device_unbusy
0.2567 fget_light                  0.2727 generic_make_request
0.2434 mptscsih_io_done            0.2510 touch_atime
0.2413 touch_atime                 0.2480 generic_file_aio_read
0.2380 get_request                 0.2453 memset_c
0.2282 try_to_wake_up              0.2448 fget_light
0.2248 mptscsih_qcmd               0.2218 dequeue_rt_stack
0.2196 sd_init_command             0.2140 sys_io_submit
0.2035 device_not_available        0.2063 scsi_dispatch_cmd
0.2032 elv_queue_empty             0.2027 _setjmp
0.2007 __errno_location            0.1996 mptscsih_io_done
0.2006 math_state_restore          0.1973 gup_huge_pmd
0.2003 _setjmp                     0.1908 __list_add
0.1995 kref_get                    0.1880 submit_page_section
0.1979 mempool_alloc               0.1879 task_rq_lock
0.1965 scsi_prep_fn                0.1856 __errno_location








^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-05-06 18:24         ` Anirban Chakraborty
@ 2009-05-06 19:25           ` Wilcox, Matthew R
  0 siblings, 0 replies; 122+ messages in thread
From: Wilcox, Matthew R @ 2009-05-06 19:25 UTC (permalink / raw)
  To: Anirban Chakraborty, Styner, Douglas W, linux-kernel
  Cc: Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B, Ma,
	Chinang, Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F,
	Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, chris.mason

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 3818 bytes --]

I suppose another way to see how close you are to replicating our setup in terms of time spent in the interrupt handler is to see how many interrupts you're getting per second.  With only one controller, you're probably getting more interrupt coalescing than we're seeing.

I suppose you could run Orion multiple times, or if you have an array which can do RAID-0 for you, you can have multiple spindles per LUN (which is what we do -- 30 LUNs, each with 15 spindles).

> -----Original Message-----
> From: Anirban Chakraborty [mailto:anirban.chakraborty@qlogic.com]
> Sent: Wednesday, May 06, 2009 11:24 AM
> To: Wilcox, Matthew R; Styner, Douglas W; linux-kernel@vger.kernel.org
> Cc: Tripathi, Sharad C; arjan@linux.intel.com; Kleen, Andi; Siddha, Suresh
> B; Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F;
> Nelson, Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan,
> Rajalakshmi; Garg, Anil K; Chilukuri, Harita; chris.mason@oracle.com
> Subject: Re: Mainline kernel OLTP performance update
> 
> 
> 
> 
> On 5/6/09 11:12 AM, "Wilcox, Matthew R" <matthew.r.wilcox@intel.com>
> wrote:
> 
> > That's a more accurate simulation of our workload, but Anirban's setup
> doesn't
> > have nearly as many spindles as ours, so he won't do as many IOPS and
> may not
> > see the problem.
> >
> I was getting an IOPS in the order of 46000, which was not too far from
> what
> Doug was getting. Orion settings indeed have a cache cold setting
> (specifying cache size as 0). The IO was done as 1k block size in
> sequential
> mode to the raw devices.
> I can have that many luns but the issue is that Orion does not support
> that
> may devices and I do not have the source code for it.
> Let me see if I can find some other tool.
> 
> -Anirban
> 
> > All I'm trying to do is get something that will show the problem on his
> setup,
> > and I think sequential IO is going to be the right answer here.  I could
> > easily be wrong.
> >
> > Neither FIO nor dd is going to have the cache behaviour of the database
> (maybe
> > Orion does?)  As far as I can tell, we come to the kernel cache-cold for
> every
> > IO simply because the database uses as many cache entries as it can.  We
> could
> > write a little program to just thrash through cachelines, or just run
> gcc at
> > the same time as this -- apparently gcc will happily chew through all
> the
> > cache it can too.
> >
> >> -----Original Message-----
> >> From: Styner, Douglas W
> >> Sent: Wednesday, May 06, 2009 11:05 AM
> >> To: Wilcox, Matthew R; Anirban Chakraborty; linux-
> kernel@vger.kernel.org
> >> Cc: Tripathi, Sharad C; arjan@linux.intel.com; Kleen, Andi; Siddha,
> Suresh
> >> B; Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F;
> >> Nelson, Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan,
> >> Rajalakshmi; Garg, Anil K; Chilukuri, Harita; chris.mason@oracle.com
> >> Subject: RE: Mainline kernel OLTP performance update
> >>
> >> Wilcox, Matthew R writes:
> >>> I'm not sure that Orion is going to give useful results in your
> hardware
> >>> setup.  I suspect you don't have enough spindles to get the IO rates
> that
> >>> are required to see the problem.  How about doing lots of contiguous
> I/O
> >>> instead?  Something as simple as:
> >>>
> >>> for i in sda sdb sdc (repeat ad nauseam); do \
> >>>     dd if=/dev/$i of=/dev/null bs=4k iflag=direct & \
> >>> done
> >>>
> >>
> >> A better workload emulator would be to use FIO to generate ~60%/40%
> >> reads/writes with ~90-95% random i/o using 2k blksize.  There is some
> >> sequential writing in our workload but only to a log file and there is
> not
> >> much activity there.

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-05-06 18:12       ` Wilcox, Matthew R
@ 2009-05-06 18:24         ` Anirban Chakraborty
  2009-05-06 19:25           ` Wilcox, Matthew R
  0 siblings, 1 reply; 122+ messages in thread
From: Anirban Chakraborty @ 2009-05-06 18:24 UTC (permalink / raw)
  To: Wilcox, Matthew R, Styner, Douglas W, linux-kernel
  Cc: Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B, Ma,
	Chinang, Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F,
	Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, chris.mason




On 5/6/09 11:12 AM, "Wilcox, Matthew R" <matthew.r.wilcox@intel.com> wrote:

> That's a more accurate simulation of our workload, but Anirban's setup doesn't
> have nearly as many spindles as ours, so he won't do as many IOPS and may not
> see the problem.
> 
I was getting an IOPS in the order of 46000, which was not too far from what
Doug was getting. Orion settings indeed have a cache cold setting
(specifying cache size as 0). The IO was done as 1k block size in sequential
mode to the raw devices.
I can have that many luns but the issue is that Orion does not support that
may devices and I do not have the source code for it.
Let me see if I can find some other tool.

-Anirban

> All I'm trying to do is get something that will show the problem on his setup,
> and I think sequential IO is going to be the right answer here.  I could
> easily be wrong.
> 
> Neither FIO nor dd is going to have the cache behaviour of the database (maybe
> Orion does?)  As far as I can tell, we come to the kernel cache-cold for every
> IO simply because the database uses as many cache entries as it can.  We could
> write a little program to just thrash through cachelines, or just run gcc at
> the same time as this -- apparently gcc will happily chew through all the
> cache it can too.
> 
>> -----Original Message-----
>> From: Styner, Douglas W
>> Sent: Wednesday, May 06, 2009 11:05 AM
>> To: Wilcox, Matthew R; Anirban Chakraborty; linux-kernel@vger.kernel.org
>> Cc: Tripathi, Sharad C; arjan@linux.intel.com; Kleen, Andi; Siddha, Suresh
>> B; Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F;
>> Nelson, Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan,
>> Rajalakshmi; Garg, Anil K; Chilukuri, Harita; chris.mason@oracle.com
>> Subject: RE: Mainline kernel OLTP performance update
>> 
>> Wilcox, Matthew R writes:
>>> I'm not sure that Orion is going to give useful results in your hardware
>>> setup.  I suspect you don't have enough spindles to get the IO rates that
>>> are required to see the problem.  How about doing lots of contiguous I/O
>>> instead?  Something as simple as:
>>> 
>>> for i in sda sdb sdc (repeat ad nauseam); do \
>>>     dd if=/dev/$i of=/dev/null bs=4k iflag=direct & \
>>> done
>>> 
>> 
>> A better workload emulator would be to use FIO to generate ~60%/40%
>> reads/writes with ~90-95% random i/o using 2k blksize.  There is some
>> sequential writing in our workload but only to a log file and there is not
>> much activity there.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-05-06  6:29 ` Anirban Chakraborty
  2009-05-06 15:53   ` Wilcox, Matthew R
@ 2009-05-06 18:19   ` Styner, Douglas W
  1 sibling, 0 replies; 122+ messages in thread
From: Styner, Douglas W @ 2009-05-06 18:19 UTC (permalink / raw)
  To: Anirban Chakraborty, linux-kernel
  Cc: Tripathi, Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi,
	Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong, Nueckel,
	Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun, Prickett,
	Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K, Chilukuri,
	Harita, chris.mason

[-- Attachment #1: Type: text/plain, Size: 698 bytes --]

Anirban Chakraborty <anirban.chakraborty@qlogic.com> writes:
>However, I do notice that profiling report generated is not consistent all
>the time. Not sure, if I am missing something in my setup. Sometimes, I do
>see following type of error messages popping up while running opreport.
>warning: [vdso] (tgid:30873 range:0x7fff6a9fe000-0x7fff6a9ff000) could not
>be found.

I am not seeing this error in our setup.

>I was wondering if your kernel config is quite different from mine. I have
>attached my kernel config file.

I noticed a number of differences; I am attaching our config.  We move our config file from one kernel to the next, updating it with changes from each kernel.

[-- Attachment #2: 2.6.30-rc4.config --]
[-- Type: application/octet-stream, Size: 81684 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30-rc4
# Thu Apr 30 10:59:19 2009
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
# CONFIG_TASK_XACCT is not set
CONFIG_AUDIT=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_TREE=y

#
# RCU Subsystem
#
CONFIG_CLASSIC_RCU=y
# CONFIG_TREE_RCU is not set
# CONFIG_PREEMPT_RCU is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
# CONFIG_GROUP_SCHED is not set
# CONFIG_CGROUPS is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
CONFIG_KALLSYMS_EXTRA_PASS=y
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_COMPAT_BRK=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_MARKERS=y
CONFIG_OPROFILE=m
# CONFIG_OPROFILE_IBS is not set
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_API_DEBUG=y
# CONFIG_SLOW_WORK is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_FREEZER=y

#
# Processor type and features
#
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
# CONFIG_SPARSE_IRQ is not set
CONFIG_X86_MPPARSE=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
# CONFIG_X86_UV is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_MEMTEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
# CONFIG_X86_DS is not set
CONFIG_HPET_TIMER=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
CONFIG_CALGARY_IOMMU=y
# CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT is not set
# CONFIG_AMD_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
# CONFIG_IOMMU_API is not set
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=255
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_I8K is not set
CONFIG_MICROCODE=m
CONFIG_MICROCODE_INTEL=y
# CONFIG_MICROCODE_AMD is not set
CONFIG_MICROCODE_OLD_INTERFACE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
# CONFIG_X86_CPU_DEBUG is not set
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
# CONFIG_NUMA_EMU is not set
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
# CONFIG_MEMORY_HOTREMOVE is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW_64K=y
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
# CONFIG_X86_PAT is not set
# CONFIG_EFI is not set
# CONFIG_SECCOMP is not set
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
# CONFIG_SCHED_HRTICK is not set
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x200000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y

#
# Power management and ACPI options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
# CONFIG_ACPI_PROCFS is not set
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=m
CONFIG_ACPI_BATTERY=m
CONFIG_ACPI_BUTTON=m
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=m
CONFIG_ACPI_SBS=m

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=m
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=m
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=m
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m

#
# CPUFreq processor drivers
#
CONFIG_X86_ACPI_CPUFREQ=m
CONFIG_X86_POWERNOW_K8=m
CONFIG_X86_SPEEDSTEP_CENTRINO=m
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_DMAR is not set
# CONFIG_INTR_REMAP is not set
CONFIG_PCIEPORTBUS=y
CONFIG_HOTPLUG_PCI_PCIE=m
CONFIG_PCIEAER=y
# CONFIG_PCIEASPM is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
# CONFIG_PCI_IOV is not set
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
CONFIG_PCCARD=y
# CONFIG_PCMCIA_DEBUG is not set
CONFIG_PCMCIA=y
CONFIG_PCMCIA_LOAD_CIS=y
CONFIG_PCMCIA_IOCTL=y
CONFIG_CARDBUS=y

#
# PC-card bridges
#
CONFIG_YENTA=y
CONFIG_YENTA_O2=y
CONFIG_YENTA_RICOH=y
CONFIG_YENTA_TI=y
CONFIG_YENTA_ENE_TUNE=y
CONFIG_YENTA_TOSHIBA=y
CONFIG_PD6729=m
# CONFIG_I82092 is not set
CONFIG_PCCARD_NONSTATIC=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_FAKE=m
CONFIG_HOTPLUG_PCI_ACPI=m
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
# CONFIG_HOTPLUG_PCI_CPCI is not set
CONFIG_HOTPLUG_PCI_SHPC=m

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
CONFIG_BINFMT_MISC=y
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=y
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_ROUTE_MULTIPATH=y
CONFIG_IP_ROUTE_VERBOSE=y
# CONFIG_IP_PNP is not set
CONFIG_NET_IPIP=m
CONFIG_NET_IPGRE=m
CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=m
CONFIG_INET_XFRM_MODE_TRANSPORT=m
CONFIG_INET_XFRM_MODE_TUNNEL=m
CONFIG_INET_XFRM_MODE_BEET=y
CONFIG_INET_LRO=y
CONFIG_INET_DIAG=m
CONFIG_INET_TCP_DIAG=m
CONFIG_TCP_CONG_ADVANCED=y
CONFIG_TCP_CONG_BIC=y
CONFIG_TCP_CONG_CUBIC=m
CONFIG_TCP_CONG_WESTWOOD=m
CONFIG_TCP_CONG_HTCP=m
CONFIG_TCP_CONG_HSTCP=m
CONFIG_TCP_CONG_HYBLA=m
CONFIG_TCP_CONG_VEGAS=m
CONFIG_TCP_CONG_SCALABLE=m
CONFIG_TCP_CONG_LP=m
CONFIG_TCP_CONG_VENO=m
# CONFIG_TCP_CONG_YEAH is not set
# CONFIG_TCP_CONG_ILLINOIS is not set
CONFIG_DEFAULT_BIC=y
# CONFIG_DEFAULT_CUBIC is not set
# CONFIG_DEFAULT_HTCP is not set
# CONFIG_DEFAULT_VEGAS is not set
# CONFIG_DEFAULT_WESTWOOD is not set
# CONFIG_DEFAULT_RENO is not set
CONFIG_DEFAULT_TCP_CONG="bic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=m
CONFIG_IPV6_PRIVACY=y
CONFIG_IPV6_ROUTER_PREF=y
CONFIG_IPV6_ROUTE_INFO=y
CONFIG_IPV6_OPTIMISTIC_DAD=y
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
CONFIG_INET6_IPCOMP=m
# CONFIG_IPV6_MIP6 is not set
CONFIG_INET6_XFRM_TUNNEL=m
CONFIG_INET6_TUNNEL=m
CONFIG_INET6_XFRM_MODE_TRANSPORT=m
CONFIG_INET6_XFRM_MODE_TUNNEL=m
CONFIG_INET6_XFRM_MODE_BEET=m
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
CONFIG_IPV6_SIT=m
CONFIG_IPV6_NDISC_NODETYPE=y
CONFIG_IPV6_TUNNEL=m
CONFIG_IPV6_MULTIPLE_TABLES=y
# CONFIG_IPV6_SUBTREES is not set
# CONFIG_IPV6_MROUTE is not set
CONFIG_NETLABEL=y
CONFIG_NETWORK_SECMARK=y
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
CONFIG_NETFILTER_ADVANCED=y
CONFIG_BRIDGE_NETFILTER=y

#
# Core Netfilter Configuration
#
CONFIG_NETFILTER_NETLINK=m
CONFIG_NETFILTER_NETLINK_QUEUE=m
CONFIG_NETFILTER_NETLINK_LOG=m
# CONFIG_NF_CONNTRACK is not set
# CONFIG_NETFILTER_TPROXY is not set
CONFIG_NETFILTER_XTABLES=m
CONFIG_NETFILTER_XT_TARGET_CLASSIFY=m
# CONFIG_NETFILTER_XT_TARGET_DSCP is not set
CONFIG_NETFILTER_XT_TARGET_HL=m
# CONFIG_NETFILTER_XT_TARGET_LED is not set
CONFIG_NETFILTER_XT_TARGET_MARK=m
# CONFIG_NETFILTER_XT_TARGET_NFLOG is not set
CONFIG_NETFILTER_XT_TARGET_NFQUEUE=m
# CONFIG_NETFILTER_XT_TARGET_RATEEST is not set
# CONFIG_NETFILTER_XT_TARGET_TRACE is not set
CONFIG_NETFILTER_XT_TARGET_SECMARK=m
# CONFIG_NETFILTER_XT_TARGET_TCPMSS is not set
# CONFIG_NETFILTER_XT_TARGET_TCPOPTSTRIP is not set
CONFIG_NETFILTER_XT_MATCH_COMMENT=m
CONFIG_NETFILTER_XT_MATCH_DCCP=m
# CONFIG_NETFILTER_XT_MATCH_DSCP is not set
CONFIG_NETFILTER_XT_MATCH_ESP=m
# CONFIG_NETFILTER_XT_MATCH_HASHLIMIT is not set
CONFIG_NETFILTER_XT_MATCH_HL=m
# CONFIG_NETFILTER_XT_MATCH_IPRANGE is not set
CONFIG_NETFILTER_XT_MATCH_LENGTH=m
CONFIG_NETFILTER_XT_MATCH_LIMIT=m
CONFIG_NETFILTER_XT_MATCH_MAC=m
CONFIG_NETFILTER_XT_MATCH_MARK=m
CONFIG_NETFILTER_XT_MATCH_MULTIPORT=m
# CONFIG_NETFILTER_XT_MATCH_OWNER is not set
CONFIG_NETFILTER_XT_MATCH_POLICY=m
CONFIG_NETFILTER_XT_MATCH_PHYSDEV=m
CONFIG_NETFILTER_XT_MATCH_PKTTYPE=m
CONFIG_NETFILTER_XT_MATCH_QUOTA=m
# CONFIG_NETFILTER_XT_MATCH_RATEEST is not set
CONFIG_NETFILTER_XT_MATCH_REALM=m
# CONFIG_NETFILTER_XT_MATCH_RECENT is not set
CONFIG_NETFILTER_XT_MATCH_SCTP=m
CONFIG_NETFILTER_XT_MATCH_STATISTIC=m
CONFIG_NETFILTER_XT_MATCH_STRING=m
CONFIG_NETFILTER_XT_MATCH_TCPMSS=m
# CONFIG_NETFILTER_XT_MATCH_TIME is not set
# CONFIG_NETFILTER_XT_MATCH_U32 is not set
CONFIG_IP_VS=m
# CONFIG_IP_VS_IPV6 is not set
# CONFIG_IP_VS_DEBUG is not set
CONFIG_IP_VS_TAB_BITS=12

#
# IPVS transport protocol load balancing support
#
CONFIG_IP_VS_PROTO_TCP=y
CONFIG_IP_VS_PROTO_UDP=y
CONFIG_IP_VS_PROTO_AH_ESP=y
CONFIG_IP_VS_PROTO_ESP=y
CONFIG_IP_VS_PROTO_AH=y

#
# IPVS scheduler
#
CONFIG_IP_VS_RR=m
CONFIG_IP_VS_WRR=m
CONFIG_IP_VS_LC=m
CONFIG_IP_VS_WLC=m
CONFIG_IP_VS_LBLC=m
CONFIG_IP_VS_LBLCR=m
CONFIG_IP_VS_DH=m
CONFIG_IP_VS_SH=m
CONFIG_IP_VS_SED=m
CONFIG_IP_VS_NQ=m

#
# IPVS application helper
#
CONFIG_IP_VS_FTP=m

#
# IP: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV4 is not set
CONFIG_IP_NF_QUEUE=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_MATCH_ADDRTYPE=m
CONFIG_IP_NF_MATCH_AH=m
CONFIG_IP_NF_MATCH_ECN=m
CONFIG_IP_NF_MATCH_TTL=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
CONFIG_IP_NF_TARGET_LOG=m
CONFIG_IP_NF_TARGET_ULOG=m
CONFIG_IP_NF_MANGLE=m
CONFIG_IP_NF_TARGET_ECN=m
CONFIG_IP_NF_TARGET_TTL=m
CONFIG_IP_NF_RAW=m
# CONFIG_IP_NF_SECURITY is not set
CONFIG_IP_NF_ARPTABLES=m
CONFIG_IP_NF_ARPFILTER=m
CONFIG_IP_NF_ARP_MANGLE=m

#
# IPv6: Netfilter Configuration
#
CONFIG_IP6_NF_QUEUE=m
CONFIG_IP6_NF_IPTABLES=m
CONFIG_IP6_NF_MATCH_AH=m
CONFIG_IP6_NF_MATCH_EUI64=m
CONFIG_IP6_NF_MATCH_FRAG=m
CONFIG_IP6_NF_MATCH_OPTS=m
CONFIG_IP6_NF_MATCH_HL=m
CONFIG_IP6_NF_MATCH_IPV6HEADER=m
# CONFIG_IP6_NF_MATCH_MH is not set
CONFIG_IP6_NF_MATCH_RT=m
CONFIG_IP6_NF_TARGET_HL=m
CONFIG_IP6_NF_TARGET_LOG=m
CONFIG_IP6_NF_FILTER=m
CONFIG_IP6_NF_TARGET_REJECT=m
CONFIG_IP6_NF_MANGLE=m
CONFIG_IP6_NF_RAW=m
# CONFIG_IP6_NF_SECURITY is not set
CONFIG_BRIDGE_NF_EBTABLES=m
CONFIG_BRIDGE_EBT_BROUTE=m
CONFIG_BRIDGE_EBT_T_FILTER=m
CONFIG_BRIDGE_EBT_T_NAT=m
CONFIG_BRIDGE_EBT_802_3=m
CONFIG_BRIDGE_EBT_AMONG=m
CONFIG_BRIDGE_EBT_ARP=m
CONFIG_BRIDGE_EBT_IP=m
# CONFIG_BRIDGE_EBT_IP6 is not set
CONFIG_BRIDGE_EBT_LIMIT=m
CONFIG_BRIDGE_EBT_MARK=m
CONFIG_BRIDGE_EBT_PKTTYPE=m
CONFIG_BRIDGE_EBT_STP=m
CONFIG_BRIDGE_EBT_VLAN=m
CONFIG_BRIDGE_EBT_ARPREPLY=m
CONFIG_BRIDGE_EBT_DNAT=m
CONFIG_BRIDGE_EBT_MARK_T=m
CONFIG_BRIDGE_EBT_REDIRECT=m
CONFIG_BRIDGE_EBT_SNAT=m
CONFIG_BRIDGE_EBT_LOG=m
CONFIG_BRIDGE_EBT_ULOG=m
# CONFIG_BRIDGE_EBT_NFLOG is not set
CONFIG_IP_DCCP=m
CONFIG_INET_DCCP_DIAG=m

#
# DCCP CCIDs Configuration (EXPERIMENTAL)
#
# CONFIG_IP_DCCP_CCID2_DEBUG is not set
CONFIG_IP_DCCP_CCID3=y
# CONFIG_IP_DCCP_CCID3_DEBUG is not set
CONFIG_IP_DCCP_CCID3_RTO=100
CONFIG_IP_DCCP_TFRC_LIB=y

#
# DCCP Kernel Hacking
#
# CONFIG_IP_DCCP_DEBUG is not set
# CONFIG_NET_DCCPPROBE is not set
CONFIG_IP_SCTP=m
# CONFIG_SCTP_DBG_MSG is not set
# CONFIG_SCTP_DBG_OBJCNT is not set
# CONFIG_SCTP_HMAC_NONE is not set
# CONFIG_SCTP_HMAC_SHA1 is not set
CONFIG_SCTP_HMAC_MD5=y
# CONFIG_RDS is not set
CONFIG_TIPC=m
# CONFIG_TIPC_ADVANCED is not set
# CONFIG_TIPC_DEBUG is not set
CONFIG_ATM=m
CONFIG_ATM_CLIP=m
# CONFIG_ATM_CLIP_NO_ICMP is not set
CONFIG_ATM_LANE=m
# CONFIG_ATM_MPOA is not set
CONFIG_ATM_BR2684=m
# CONFIG_ATM_BR2684_IPFILTER is not set
CONFIG_STP=m
CONFIG_BRIDGE=m
# CONFIG_NET_DSA is not set
CONFIG_VLAN_8021Q=m
# CONFIG_VLAN_8021Q_GVRP is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
CONFIG_NET_SCH_HTB=m
CONFIG_NET_SCH_HFSC=m
CONFIG_NET_SCH_ATM=m
CONFIG_NET_SCH_PRIO=m
# CONFIG_NET_SCH_MULTIQ is not set
CONFIG_NET_SCH_RED=m
CONFIG_NET_SCH_SFQ=m
CONFIG_NET_SCH_TEQL=m
CONFIG_NET_SCH_TBF=m
CONFIG_NET_SCH_GRED=m
CONFIG_NET_SCH_DSMARK=m
CONFIG_NET_SCH_NETEM=m
# CONFIG_NET_SCH_DRR is not set
CONFIG_NET_SCH_INGRESS=m

#
# Classification
#
CONFIG_NET_CLS=y
CONFIG_NET_CLS_BASIC=m
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
CONFIG_CLS_U32_PERF=y
CONFIG_CLS_U32_MARK=y
CONFIG_NET_CLS_RSVP=m
CONFIG_NET_CLS_RSVP6=m
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_EMATCH=y
CONFIG_NET_EMATCH_STACK=32
CONFIG_NET_EMATCH_CMP=m
CONFIG_NET_EMATCH_NBYTE=m
CONFIG_NET_EMATCH_U32=m
CONFIG_NET_EMATCH_META=m
CONFIG_NET_EMATCH_TEXT=m
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=m
CONFIG_NET_ACT_GACT=m
CONFIG_GACT_PROB=y
CONFIG_NET_ACT_MIRRED=m
CONFIG_NET_ACT_IPT=m
# CONFIG_NET_ACT_NAT is not set
CONFIG_NET_ACT_PEDIT=m
CONFIG_NET_ACT_SIMP=m
# CONFIG_NET_ACT_SKBEDIT is not set
CONFIG_NET_CLS_IND=y
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set

#
# Network testing
#
CONFIG_NET_PKTGEN=m
# CONFIG_NET_TCPPROBE is not set
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
CONFIG_BT=m
CONFIG_BT_L2CAP=m
CONFIG_BT_SCO=m
CONFIG_BT_RFCOMM=m
CONFIG_BT_RFCOMM_TTY=y
CONFIG_BT_BNEP=m
CONFIG_BT_BNEP_MC_FILTER=y
CONFIG_BT_BNEP_PROTO_FILTER=y
CONFIG_BT_HIDP=m

#
# Bluetooth device drivers
#
# CONFIG_BT_HCIBTUSB is not set
# CONFIG_BT_HCIBTSDIO is not set
CONFIG_BT_HCIUART=m
CONFIG_BT_HCIUART_H4=y
CONFIG_BT_HCIUART_BCSP=y
# CONFIG_BT_HCIUART_LL is not set
CONFIG_BT_HCIBCM203X=m
CONFIG_BT_HCIBPA10X=m
CONFIG_BT_HCIBFUSB=m
CONFIG_BT_HCIDTL1=m
CONFIG_BT_HCIBT3C=m
CONFIG_BT_HCIBLUECARD=m
CONFIG_BT_HCIBTUART=m
CONFIG_BT_HCIVHCI=m
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
CONFIG_WIRELESS=y
CONFIG_CFG80211=m
# CONFIG_CFG80211_REG_DEBUG is not set
# CONFIG_WIRELESS_OLD_REGULATORY is not set
CONFIG_WIRELESS_EXT=y
CONFIG_WIRELESS_EXT_SYSFS=y
# CONFIG_LIB80211 is not set
CONFIG_MAC80211=m

#
# Rate control algorithm selection
#
CONFIG_MAC80211_RC_MINSTREL=y
# CONFIG_MAC80211_RC_DEFAULT_PID is not set
CONFIG_MAC80211_RC_DEFAULT_MINSTREL=y
CONFIG_MAC80211_RC_DEFAULT="minstrel"
# CONFIG_MAC80211_MESH is not set
CONFIG_MAC80211_LEDS=y
# CONFIG_MAC80211_DEBUGFS is not set
# CONFIG_MAC80211_DEBUG_MENU is not set
# CONFIG_WIMAX is not set
CONFIG_RFKILL=m
# CONFIG_RFKILL_INPUT is not set
CONFIG_RFKILL_LEDS=y
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
CONFIG_MTD=m
# CONFIG_MTD_DEBUG is not set
CONFIG_MTD_CONCAT=m
CONFIG_MTD_PARTITIONS=y
# CONFIG_MTD_TESTS is not set
CONFIG_MTD_REDBOOT_PARTS=m
CONFIG_MTD_REDBOOT_DIRECTORY_BLOCK=-1
# CONFIG_MTD_REDBOOT_PARTS_UNALLOCATED is not set
# CONFIG_MTD_REDBOOT_PARTS_READONLY is not set
# CONFIG_MTD_AR7_PARTS is not set

#
# User Modules And Translation Layers
#
CONFIG_MTD_CHAR=m
CONFIG_MTD_BLKDEVS=m
CONFIG_MTD_BLOCK=m
CONFIG_MTD_BLOCK_RO=m
CONFIG_FTL=m
CONFIG_NFTL=m
CONFIG_NFTL_RW=y
# CONFIG_INFTL is not set
CONFIG_RFD_FTL=m
# CONFIG_SSFDC is not set
# CONFIG_MTD_OOPS is not set

#
# RAM/ROM/Flash chip drivers
#
CONFIG_MTD_CFI=m
CONFIG_MTD_JEDECPROBE=m
CONFIG_MTD_GEN_PROBE=m
# CONFIG_MTD_CFI_ADV_OPTIONS is not set
CONFIG_MTD_MAP_BANK_WIDTH_1=y
CONFIG_MTD_MAP_BANK_WIDTH_2=y
CONFIG_MTD_MAP_BANK_WIDTH_4=y
# CONFIG_MTD_MAP_BANK_WIDTH_8 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_16 is not set
# CONFIG_MTD_MAP_BANK_WIDTH_32 is not set
CONFIG_MTD_CFI_I1=y
CONFIG_MTD_CFI_I2=y
# CONFIG_MTD_CFI_I4 is not set
# CONFIG_MTD_CFI_I8 is not set
CONFIG_MTD_CFI_INTELEXT=m
CONFIG_MTD_CFI_AMDSTD=m
CONFIG_MTD_CFI_STAA=m
CONFIG_MTD_CFI_UTIL=m
CONFIG_MTD_RAM=m
CONFIG_MTD_ROM=m
CONFIG_MTD_ABSENT=m

#
# Mapping drivers for chip access
#
# CONFIG_MTD_COMPLEX_MAPPINGS is not set
# CONFIG_MTD_PHYSMAP is not set
CONFIG_MTD_SC520CDP=m
CONFIG_MTD_NETSC520=m
CONFIG_MTD_TS5500=m
# CONFIG_MTD_AMD76XROM is not set
# CONFIG_MTD_ICHXROM is not set
# CONFIG_MTD_ESB2ROM is not set
# CONFIG_MTD_CK804XROM is not set
CONFIG_MTD_SCB2_FLASH=m
# CONFIG_MTD_NETtel is not set
# CONFIG_MTD_DILNETPC is not set
# CONFIG_MTD_L440GX is not set
# CONFIG_MTD_INTEL_VR_NOR is not set
# CONFIG_MTD_PLATRAM is not set

#
# Self-contained MTD device drivers
#
# CONFIG_MTD_PMC551 is not set
# CONFIG_MTD_SLRAM is not set
# CONFIG_MTD_PHRAM is not set
CONFIG_MTD_MTDRAM=m
CONFIG_MTDRAM_TOTAL_SIZE=4096
CONFIG_MTDRAM_ERASE_SIZE=128
CONFIG_MTD_BLOCK2MTD=m

#
# Disk-On-Chip Device Drivers
#
# CONFIG_MTD_DOC2000 is not set
# CONFIG_MTD_DOC2001 is not set
# CONFIG_MTD_DOC2001PLUS is not set
CONFIG_MTD_NAND=m
# CONFIG_MTD_NAND_VERIFY_WRITE is not set
CONFIG_MTD_NAND_ECC_SMC=y
# CONFIG_MTD_NAND_MUSEUM_IDS is not set
CONFIG_MTD_NAND_IDS=m
CONFIG_MTD_NAND_DISKONCHIP=m
# CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADVANCED is not set
CONFIG_MTD_NAND_DISKONCHIP_PROBE_ADDRESS=0
# CONFIG_MTD_NAND_DISKONCHIP_BBTWRITE is not set
# CONFIG_MTD_NAND_CAFE is not set
CONFIG_MTD_NAND_NANDSIM=m
# CONFIG_MTD_NAND_PLATFORM is not set
# CONFIG_MTD_ALAUDA is not set
# CONFIG_MTD_ONENAND is not set

#
# LPDDR flash memory drivers
#
# CONFIG_MTD_LPDDR is not set

#
# UBI - Unsorted block images
#
# CONFIG_MTD_UBI is not set
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_SERIAL=m
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
CONFIG_PARPORT_PC_PCMCIA=m
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_PARPORT_NOT_PC=y
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=m
CONFIG_PARIDE=m

#
# Parallel IDE high-level drivers
#
CONFIG_PARIDE_PD=m
CONFIG_PARIDE_PCD=m
CONFIG_PARIDE_PF=m
CONFIG_PARIDE_PT=m
CONFIG_PARIDE_PG=m

#
# Parallel IDE protocol modules
#
CONFIG_PARIDE_ATEN=m
CONFIG_PARIDE_BPCK=m
CONFIG_PARIDE_COMM=m
CONFIG_PARIDE_DSTR=m
CONFIG_PARIDE_FIT2=m
CONFIG_PARIDE_FIT3=m
CONFIG_PARIDE_EPAT=m
CONFIG_PARIDE_EPATC8=y
CONFIG_PARIDE_EPIA=m
CONFIG_PARIDE_FRIQ=m
CONFIG_PARIDE_FRPW=m
CONFIG_PARIDE_KBIC=m
CONFIG_PARIDE_KTTI=m
CONFIG_PARIDE_ON20=m
CONFIG_PARIDE_ON26=m
CONFIG_BLK_CPQ_DA=m
CONFIG_BLK_CPQ_CISS_DA=m
CONFIG_CISS_SCSI_TAPE=y
CONFIG_BLK_DEV_DAC960=m
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
CONFIG_BLK_DEV_CRYPTOLOOP=m
CONFIG_BLK_DEV_NBD=m
CONFIG_BLK_DEV_SX8=m
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=16384
# CONFIG_BLK_DEV_XIP is not set
CONFIG_CDROM_PKTCDVD=m
CONFIG_CDROM_PKTCDVD_BUFFERS=8
# CONFIG_CDROM_PKTCDVD_WCACHE is not set
CONFIG_ATA_OVER_ETH=m
# CONFIG_BLK_DEV_HD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_DELL_LAPTOP is not set
# CONFIG_ISL29003 is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
CONFIG_EEPROM_93CX6=m
CONFIG_HAVE_IDE=y
CONFIG_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_XFER_MODE=y
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_IDE_GD=y
CONFIG_IDE_GD_ATA=y
# CONFIG_IDE_GD_ATAPI is not set
CONFIG_BLK_DEV_IDECS=m
# CONFIG_BLK_DEV_DELKIN is not set
CONFIG_BLK_DEV_IDECD=m
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEACPI is not set
CONFIG_IDE_TASK_IOCTL=y
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_PLATFORM is not set
# CONFIG_BLK_DEV_CMD640 is not set
CONFIG_BLK_DEV_IDEPNP=y
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
CONFIG_BLK_DEV_AEC62XX=y
CONFIG_BLK_DEV_ALI15X3=y
CONFIG_BLK_DEV_AMD74XX=y
CONFIG_BLK_DEV_ATIIXP=y
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
CONFIG_BLK_DEV_HPT366=y
# CONFIG_BLK_DEV_JMICRON is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT8172 is not set
# CONFIG_BLK_DEV_IT8213 is not set
CONFIG_BLK_DEV_IT821X=y
# CONFIG_BLK_DEV_NS87415 is not set
CONFIG_BLK_DEV_PDC202XX_OLD=y
CONFIG_BLK_DEV_PDC202XX_NEW=y
CONFIG_BLK_DEV_SVWKS=y
CONFIG_BLK_DEV_SIIMAGE=y
CONFIG_BLK_DEV_SIS5513=y
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
CONFIG_BLK_DEV_VIA82CXXX=y
# CONFIG_BLK_DEV_TC86C001 is not set
CONFIG_BLK_DEV_IDEDMA=y

#
# SCSI device support
#
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
CONFIG_CHR_DEV_OSST=m
CONFIG_BLK_DEV_SR=m
CONFIG_BLK_DEV_SR_VENDOR=y
CONFIG_CHR_DEV_SG=m
CONFIG_CHR_DEV_SCH=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=m
CONFIG_SCSI_FC_ATTRS=m
CONFIG_SCSI_ISCSI_ATTRS=m
CONFIG_SCSI_SAS_ATTRS=m
CONFIG_SCSI_SAS_LIBSAS=m
CONFIG_SCSI_SAS_ATA=y
CONFIG_SCSI_SAS_HOST_SMP=y
# CONFIG_SCSI_SAS_LIBSAS_DEBUG is not set
CONFIG_SCSI_SRP_ATTRS=m
CONFIG_SCSI_LOWLEVEL=y
CONFIG_ISCSI_TCP=m
# CONFIG_SCSI_CXGB3_ISCSI is not set
CONFIG_BLK_DEV_3W_XXXX_RAID=m
CONFIG_SCSI_3W_9XXX=m
CONFIG_SCSI_ACARD=m
CONFIG_SCSI_AACRAID=m
CONFIG_SCSI_AIC7XXX=m
CONFIG_AIC7XXX_CMDS_PER_DEVICE=4
CONFIG_AIC7XXX_RESET_DELAY_MS=15000
# CONFIG_AIC7XXX_DEBUG_ENABLE is not set
CONFIG_AIC7XXX_DEBUG_MASK=0
# CONFIG_AIC7XXX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC7XXX_OLD=m
CONFIG_SCSI_AIC79XX=m
CONFIG_AIC79XX_CMDS_PER_DEVICE=4
CONFIG_AIC79XX_RESET_DELAY_MS=15000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
CONFIG_SCSI_AIC94XX=m
# CONFIG_AIC94XX_DEBUG is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
CONFIG_SCSI_ARCMSR=m
# CONFIG_SCSI_ARCMSR_AER is not set
CONFIG_MEGARAID_NEWGEN=y
CONFIG_MEGARAID_MM=m
CONFIG_MEGARAID_MAILBOX=m
CONFIG_MEGARAID_LEGACY=m
CONFIG_MEGARAID_SAS=m
# CONFIG_SCSI_MPT2SAS is not set
CONFIG_SCSI_HPTIOP=m
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GDTH=m
CONFIG_SCSI_IPS=m
CONFIG_SCSI_INITIO=m
# CONFIG_SCSI_INIA100 is not set
CONFIG_SCSI_PPA=m
CONFIG_SCSI_IMM=m
# CONFIG_SCSI_IZIP_EPP16 is not set
# CONFIG_SCSI_IZIP_SLOW_CTR is not set
# CONFIG_SCSI_MVSAS is not set
CONFIG_SCSI_STEX=m
CONFIG_SCSI_SYM53C8XX_2=m
CONFIG_SCSI_SYM53C8XX_DMA_ADDRESSING_MODE=1
CONFIG_SCSI_SYM53C8XX_DEFAULT_TAGS=16
CONFIG_SCSI_SYM53C8XX_MAX_TAGS=64
CONFIG_SCSI_SYM53C8XX_MMIO=y
# CONFIG_SCSI_IPR is not set
CONFIG_SCSI_QLOGIC_1280=m
CONFIG_SCSI_QLA_FC=m
CONFIG_SCSI_QLA_ISCSI=m
CONFIG_SCSI_LPFC=m
# CONFIG_SCSI_LPFC_DEBUG_FS is not set
CONFIG_SCSI_DC395x=m
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_LOWLEVEL_PCMCIA is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=m
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=m
CONFIG_SATA_SIL24=m
CONFIG_ATA_SFF=y
CONFIG_SATA_SVW=m
CONFIG_ATA_PIIX=m
CONFIG_SATA_MV=m
CONFIG_SATA_NV=m
CONFIG_PDC_ADMA=m
CONFIG_SATA_QSTOR=m
CONFIG_SATA_PROMISE=m
CONFIG_SATA_SX4=m
CONFIG_SATA_SIL=m
CONFIG_SATA_SIS=m
CONFIG_SATA_ULI=m
CONFIG_SATA_VIA=m
CONFIG_SATA_VITESSE=m
CONFIG_SATA_INIC162X=m
# CONFIG_PATA_ACPI is not set
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_TRIFLEX is not set
CONFIG_PATA_MARVELL=m
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PCMCIA is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
CONFIG_PATA_PDC2027X=m
CONFIG_PATA_SIL680=m
CONFIG_PATA_SIS=m
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_SCH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_AUTODETECT=y
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
CONFIG_MD_RAID10=m
CONFIG_MD_RAID456=m
CONFIG_MD_RAID6_PQ=m
CONFIG_MD_MULTIPATH=m
CONFIG_MD_FAULTY=m
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_DEBUG is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
# CONFIG_DM_DELAY is not set
CONFIG_DM_UEVENT=y
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=m
CONFIG_FUSION_LAN=m
CONFIG_FUSION_LOGGING=y

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
CONFIG_FIREWIRE=m
CONFIG_FIREWIRE_OHCI=m
CONFIG_FIREWIRE_OHCI_DEBUG=y
CONFIG_FIREWIRE_SBP2=m
# CONFIG_IEEE1394 is not set
CONFIG_I2O=m
# CONFIG_I2O_LCT_NOTIFY_ON_CHANGES is not set
CONFIG_I2O_EXT_ADAPTEC=y
CONFIG_I2O_EXT_ADAPTEC_DMA64=y
CONFIG_I2O_CONFIG=m
CONFIG_I2O_CONFIG_OLD_IOCTL=y
CONFIG_I2O_BUS=m
CONFIG_I2O_BLOCK=m
CONFIG_I2O_SCSI=m
CONFIG_I2O_PROC=m
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_COMPAT_NET_DEV_OPS=y
CONFIG_IFB=m
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=m
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_PHYLIB=m

#
# MII PHY device drivers
#
CONFIG_MARVELL_PHY=m
CONFIG_DAVICOM_PHY=m
CONFIG_QSEMI_PHY=m
CONFIG_LXT_PHY=m
CONFIG_CICADA_PHY=m
CONFIG_VITESSE_PHY=m
CONFIG_SMSC_PHY=m
# CONFIG_BROADCOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=m
CONFIG_HAPPYMEAL=m
CONFIG_SUNGEM=m
CONFIG_CASSINI=m
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=m
CONFIG_TYPHOON=m
# CONFIG_ETHOC is not set
# CONFIG_DNET is not set
CONFIG_NET_TULIP=y
CONFIG_DE2104X=m
CONFIG_TULIP=m
# CONFIG_TULIP_MWI is not set
CONFIG_TULIP_MMIO=y
# CONFIG_TULIP_NAPI is not set
CONFIG_DE4X5=m
CONFIG_WINBOND_840=m
CONFIG_DM9102=m
CONFIG_ULI526X=m
CONFIG_PCMCIA_XIRCOM=m
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
CONFIG_PCNET32=m
CONFIG_AMD8111_ETH=m
CONFIG_ADAPTEC_STARFIRE=m
CONFIG_B44=m
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=m
# CONFIG_FORCEDETH_NAPI is not set
CONFIG_E100=m
CONFIG_FEALNX=m
CONFIG_NATSEMI=m
CONFIG_NE2K_PCI=m
CONFIG_8139CP=m
CONFIG_8139TOO=m
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
CONFIG_8139TOO_8129=y
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
CONFIG_SIS900=m
CONFIG_EPIC100=m
# CONFIG_SMSC9420 is not set
CONFIG_SUNDANCE=m
# CONFIG_SUNDANCE_MMIO is not set
# CONFIG_TLAN is not set
CONFIG_VIA_RHINE=m
CONFIG_VIA_RHINE_MMIO=y
# CONFIG_SC92031 is not set
CONFIG_NET_POCKET=y
# CONFIG_ATP is not set
# CONFIG_DE600 is not set
# CONFIG_DE620 is not set
# CONFIG_ATL2 is not set
CONFIG_NETDEV_1000=y
CONFIG_ACENIC=m
# CONFIG_ACENIC_OMIT_TIGON_I is not set
CONFIG_DL2K=m
CONFIG_E1000=m
CONFIG_E1000E=m
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
CONFIG_NS83820=m
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_R8169=m
CONFIG_R8169_VLAN=y
CONFIG_SIS190=m
CONFIG_SKGE=m
# CONFIG_SKGE_DEBUG is not set
CONFIG_SKY2=m
# CONFIG_SKY2_DEBUG is not set
CONFIG_VIA_VELOCITY=m
CONFIG_TIGON3=m
CONFIG_BNX2=m
CONFIG_QLA3XXX=m
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_JME is not set
CONFIG_NETDEV_10000=y
CONFIG_CHELSIO_T1=m
# CONFIG_CHELSIO_T1_1G is not set
CONFIG_CHELSIO_T3_DEPENDS=y
CONFIG_CHELSIO_T3=m
# CONFIG_ENIC is not set
CONFIG_IXGBE=m
CONFIG_IXGB=m
CONFIG_S2IO=m
# CONFIG_VXGE is not set
CONFIG_MYRI10GE=m
CONFIG_NETXEN_NIC=m
CONFIG_NIU=m
# CONFIG_MLX4_EN is not set
CONFIG_MLX4_CORE=m
CONFIG_MLX4_DEBUG=y
# CONFIG_TEHUTI is not set
# CONFIG_BNX2X is not set
# CONFIG_QLGE is not set
# CONFIG_SFC is not set
# CONFIG_BE2NET is not set
CONFIG_TR=y
CONFIG_IBMOL=m
CONFIG_3C359=m
# CONFIG_TMS380TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
CONFIG_USB_CATC=m
CONFIG_USB_KAWETH=m
CONFIG_USB_PEGASUS=m
CONFIG_USB_RTL8150=m
CONFIG_USB_USBNET=m
CONFIG_USB_NET_AX8817X=m
CONFIG_USB_NET_CDCETHER=m
CONFIG_USB_NET_DM9601=m
# CONFIG_USB_NET_SMSC95XX is not set
CONFIG_USB_NET_GL620A=m
CONFIG_USB_NET_NET1080=m
CONFIG_USB_NET_PLUSB=m
# CONFIG_USB_NET_MCS7830 is not set
CONFIG_USB_NET_RNDIS_HOST=m
CONFIG_USB_NET_CDC_SUBSET=m
CONFIG_USB_ALI_M5632=y
CONFIG_USB_AN2720=y
CONFIG_USB_BELKIN=y
CONFIG_USB_ARMLINUX=y
CONFIG_USB_EPSON2888=y
# CONFIG_USB_KC2190 is not set
CONFIG_USB_NET_ZAURUS=m
# CONFIG_USB_HSO is not set
CONFIG_NET_PCMCIA=y
CONFIG_PCMCIA_3C589=m
CONFIG_PCMCIA_3C574=m
CONFIG_PCMCIA_FMVJ18X=m
CONFIG_PCMCIA_PCNET=m
CONFIG_PCMCIA_NMCLAN=m
CONFIG_PCMCIA_SMC91C92=m
CONFIG_PCMCIA_XIRC2PS=m
CONFIG_PCMCIA_AXNET=m
# CONFIG_PCMCIA_IBMTR is not set
# CONFIG_WAN is not set
CONFIG_ATM_DRIVERS=y
# CONFIG_ATM_DUMMY is not set
CONFIG_ATM_TCP=m
CONFIG_ATM_LANAI=m
CONFIG_ATM_ENI=m
# CONFIG_ATM_ENI_DEBUG is not set
# CONFIG_ATM_ENI_TUNE_BURST is not set
CONFIG_ATM_FIRESTREAM=m
# CONFIG_ATM_ZATM is not set
CONFIG_ATM_IDT77252=m
# CONFIG_ATM_IDT77252_DEBUG is not set
# CONFIG_ATM_IDT77252_RCV_ALL is not set
CONFIG_ATM_IDT77252_USE_SUNI=y
CONFIG_ATM_AMBASSADOR=m
# CONFIG_ATM_AMBASSADOR_DEBUG is not set
CONFIG_ATM_HORIZON=m
# CONFIG_ATM_HORIZON_DEBUG is not set
# CONFIG_ATM_IA is not set
# CONFIG_ATM_FORE200E is not set
CONFIG_ATM_HE=m
# CONFIG_ATM_HE_USE_SUNI is not set
# CONFIG_ATM_SOLOS is not set
CONFIG_FDDI=y
# CONFIG_DEFXX is not set
# CONFIG_SKFP is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
CONFIG_PPP=m
CONFIG_PPP_MULTILINK=y
CONFIG_PPP_FILTER=y
CONFIG_PPP_ASYNC=m
CONFIG_PPP_SYNC_TTY=m
CONFIG_PPP_DEFLATE=m
# CONFIG_PPP_BSDCOMP is not set
CONFIG_PPP_MPPE=m
CONFIG_PPPOE=m
CONFIG_PPPOATM=m
# CONFIG_PPPOL2TP is not set
CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
CONFIG_SLHC=m
CONFIG_SLIP_SMART=y
# CONFIG_SLIP_MODE_SLIP6 is not set
CONFIG_NET_FC=y
CONFIG_NETCONSOLE=m
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
CONFIG_NETPOLL_TRAP=y
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=y
CONFIG_INPUT_POLLDEV=m

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
CONFIG_INPUT_JOYDEV=m
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
CONFIG_MOUSE_VSXXXAA=m
CONFIG_INPUT_JOYSTICK=y
# CONFIG_JOYSTICK_ANALOG is not set
# CONFIG_JOYSTICK_A3D is not set
# CONFIG_JOYSTICK_ADI is not set
# CONFIG_JOYSTICK_COBRA is not set
# CONFIG_JOYSTICK_GF2K is not set
# CONFIG_JOYSTICK_GRIP is not set
# CONFIG_JOYSTICK_GRIP_MP is not set
# CONFIG_JOYSTICK_GUILLEMOT is not set
# CONFIG_JOYSTICK_INTERACT is not set
# CONFIG_JOYSTICK_SIDEWINDER is not set
# CONFIG_JOYSTICK_TMDC is not set
# CONFIG_JOYSTICK_IFORCE is not set
# CONFIG_JOYSTICK_WARRIOR is not set
# CONFIG_JOYSTICK_MAGELLAN is not set
# CONFIG_JOYSTICK_SPACEORB is not set
# CONFIG_JOYSTICK_SPACEBALL is not set
# CONFIG_JOYSTICK_STINGER is not set
CONFIG_JOYSTICK_TWIDJOY=m
# CONFIG_JOYSTICK_ZHENHUA is not set
# CONFIG_JOYSTICK_DB9 is not set
# CONFIG_JOYSTICK_GAMECON is not set
# CONFIG_JOYSTICK_TURBOGRAFX is not set
CONFIG_JOYSTICK_JOYDUMP=m
# CONFIG_JOYSTICK_XPAD is not set
# CONFIG_INPUT_TABLET is not set
CONFIG_INPUT_TOUCHSCREEN=y
# CONFIG_TOUCHSCREEN_AD7879_I2C is not set
# CONFIG_TOUCHSCREEN_AD7879 is not set
# CONFIG_TOUCHSCREEN_FUJITSU is not set
CONFIG_TOUCHSCREEN_GUNZE=m
CONFIG_TOUCHSCREEN_ELO=m
# CONFIG_TOUCHSCREEN_WACOM_W8001 is not set
CONFIG_TOUCHSCREEN_MTOUCH=m
# CONFIG_TOUCHSCREEN_INEXIO is not set
CONFIG_TOUCHSCREEN_MK712=m
# CONFIG_TOUCHSCREEN_PENMOUNT is not set
# CONFIG_TOUCHSCREEN_TOUCHRIGHT is not set
# CONFIG_TOUCHSCREEN_TOUCHWIN is not set
# CONFIG_TOUCHSCREEN_WM97XX is not set
# CONFIG_TOUCHSCREEN_USB_COMPOSITE is not set
# CONFIG_TOUCHSCREEN_TOUCHIT213 is not set
# CONFIG_TOUCHSCREEN_TSC2007 is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=m
# CONFIG_INPUT_APANEL is not set
# CONFIG_INPUT_ATLAS_BTNS is not set
# CONFIG_INPUT_ATI_REMOTE is not set
# CONFIG_INPUT_ATI_REMOTE2 is not set
# CONFIG_INPUT_KEYSPAN_REMOTE is not set
# CONFIG_INPUT_POWERMATE is not set
# CONFIG_INPUT_YEALINK is not set
# CONFIG_INPUT_CM109 is not set
CONFIG_INPUT_UINPUT=m

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
CONFIG_SERIO_SERPORT=y
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
CONFIG_SERIO_RAW=m
CONFIG_GAMEPORT=m
CONFIG_GAMEPORT_NS558=m
CONFIG_GAMEPORT_L4=m
CONFIG_GAMEPORT_EMU10K1=m
CONFIG_GAMEPORT_FM801=m

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
CONFIG_VT_HW_CONSOLE_BINDING=y
CONFIG_DEVKMEM=y
CONFIG_SERIAL_NONSTANDARD=y
# CONFIG_COMPUTONE is not set
# CONFIG_ROCKETPORT is not set
CONFIG_CYCLADES=m
# CONFIG_CYZ_INTR is not set
# CONFIG_DIGIEPCA is not set
# CONFIG_MOXA_INTELLIO is not set
# CONFIG_MOXA_SMARTIO is not set
# CONFIG_ISI is not set
CONFIG_SYNCLINK=m
CONFIG_SYNCLINKMP=m
CONFIG_SYNCLINK_GT=m
CONFIG_N_HDLC=m
# CONFIG_RISCOM8 is not set
# CONFIG_SPECIALIX is not set
# CONFIG_SX is not set
# CONFIG_RIO is not set
# CONFIG_STALDRV is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_CS=m
CONFIG_SERIAL_8250_NR_UARTS=32
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
CONFIG_SERIAL_8250_EXTENDED=y
CONFIG_SERIAL_8250_MANY_PORTS=y
CONFIG_SERIAL_8250_SHARE_IRQ=y
CONFIG_SERIAL_8250_DETECT_IRQ=y
CONFIG_SERIAL_8250_RSA=y

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
CONFIG_SERIAL_JSM=m
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
# CONFIG_LEGACY_PTYS is not set
CONFIG_PRINTER=m
CONFIG_LP_CONSOLE=y
CONFIG_PPDEV=m
CONFIG_IPMI_HANDLER=m
CONFIG_IPMI_PANIC_EVENT=y
CONFIG_IPMI_PANIC_STRING=y
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=m
CONFIG_HW_RANDOM_AMD=m
CONFIG_NVRAM=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set

#
# PCMCIA character devices
#
# CONFIG_SYNCLINK_CS is not set
CONFIG_CARDMAN_4000=m
CONFIG_CARDMAN_4040=m
# CONFIG_IPWIRELESS is not set
# CONFIG_MWAVE is not set
CONFIG_PC8736x_GPIO=m
CONFIG_NSC_GPIO=m
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=8192
CONFIG_HPET=y
# CONFIG_HPET_MMAP is not set
CONFIG_HANGCHECK_TIMER=m
CONFIG_TCG_TPM=m
CONFIG_TCG_TIS=m
CONFIG_TCG_NSC=m
CONFIG_TCG_ATMEL=m
CONFIG_TCG_INFINEON=m
CONFIG_TELCLOCK=m
CONFIG_DEVPORT=y
CONFIG_I2C=m
CONFIG_I2C_BOARDINFO=y
CONFIG_I2C_CHARDEV=m
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=m

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
CONFIG_I2C_AMD756=m
CONFIG_I2C_AMD756_S4882=m
CONFIG_I2C_AMD8111=m
CONFIG_I2C_I801=m
# CONFIG_I2C_ISCH is not set
CONFIG_I2C_PIIX4=m
CONFIG_I2C_NFORCE2=m
# CONFIG_I2C_NFORCE2_S4985 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
CONFIG_I2C_SIS96X=m
CONFIG_I2C_VIA=m
CONFIG_I2C_VIAPRO=m

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_SIMTEC is not set

#
# External I2C/SMBus adapter drivers
#
CONFIG_I2C_PARPORT=m
CONFIG_I2C_PARPORT_LIGHT=m
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Graphics adapter I2C/DDC channel drivers
#
CONFIG_I2C_VOODOO3=m

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_PCA_PLATFORM is not set
CONFIG_I2C_STUB=m

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
CONFIG_SENSORS_PCF8574=m
# CONFIG_PCF8575 is not set
CONFIG_SENSORS_PCA9539=m
CONFIG_SENSORS_MAX6875=m
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
# CONFIG_BATTERY_BQ27x00 is not set
CONFIG_HWMON=m
CONFIG_HWMON_VID=m
CONFIG_SENSORS_ABITUGURU=m
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
CONFIG_SENSORS_ADM1021=m
CONFIG_SENSORS_ADM1025=m
CONFIG_SENSORS_ADM1026=m
# CONFIG_SENSORS_ADM1029 is not set
CONFIG_SENSORS_ADM1031=m
CONFIG_SENSORS_ADM9240=m
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
# CONFIG_SENSORS_ADT7475 is not set
CONFIG_SENSORS_K8TEMP=m
CONFIG_SENSORS_ASB100=m
# CONFIG_SENSORS_ATK0110 is not set
CONFIG_SENSORS_ATXP1=m
CONFIG_SENSORS_DS1621=m
# CONFIG_SENSORS_I5K_AMB is not set
CONFIG_SENSORS_F71805F=m
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
CONFIG_SENSORS_FSCHER=m
CONFIG_SENSORS_FSCPOS=m
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
CONFIG_SENSORS_GL518SM=m
CONFIG_SENSORS_GL520SM=m
# CONFIG_SENSORS_CORETEMP is not set
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
CONFIG_SENSORS_IT87=m
CONFIG_SENSORS_LM63=m
CONFIG_SENSORS_LM75=m
CONFIG_SENSORS_LM77=m
CONFIG_SENSORS_LM78=m
CONFIG_SENSORS_LM80=m
CONFIG_SENSORS_LM83=m
CONFIG_SENSORS_LM85=m
CONFIG_SENSORS_LM87=m
CONFIG_SENSORS_LM90=m
CONFIG_SENSORS_LM92=m
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LM95241 is not set
CONFIG_SENSORS_MAX1619=m
# CONFIG_SENSORS_MAX6650 is not set
CONFIG_SENSORS_PC87360=m
# CONFIG_SENSORS_PC87427 is not set
CONFIG_SENSORS_PCF8591=m
CONFIG_SENSORS_SIS5595=m
# CONFIG_SENSORS_DME1737 is not set
CONFIG_SENSORS_SMSC47M1=m
CONFIG_SENSORS_SMSC47M192=m
CONFIG_SENSORS_SMSC47B397=m
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
CONFIG_SENSORS_VIA686A=m
# CONFIG_SENSORS_VT1211 is not set
CONFIG_SENSORS_VT8231=m
CONFIG_SENSORS_W83781D=m
CONFIG_SENSORS_W83791D=m
CONFIG_SENSORS_W83792D=m
# CONFIG_SENSORS_W83793 is not set
CONFIG_SENSORS_W83L785TS=m
# CONFIG_SENSORS_W83L786NG is not set
CONFIG_SENSORS_W83627HF=m
CONFIG_SENSORS_W83627EHF=m
CONFIG_SENSORS_HDAPS=m
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=y
CONFIG_WATCHDOG=y
# CONFIG_WATCHDOG_NOWAYOUT is not set

#
# Watchdog Device Drivers
#
CONFIG_SOFT_WATCHDOG=m
# CONFIG_ACQUIRE_WDT is not set
# CONFIG_ADVANTECH_WDT is not set
CONFIG_ALIM1535_WDT=m
CONFIG_ALIM7101_WDT=m
# CONFIG_SC520_WDT is not set
# CONFIG_EUROTECH_WDT is not set
# CONFIG_IB700_WDT is not set
CONFIG_IBMASR=m
# CONFIG_WAFER_WDT is not set
CONFIG_I6300ESB_WDT=m
# CONFIG_ITCO_WDT is not set
# CONFIG_IT8712F_WDT is not set
# CONFIG_IT87_WDT is not set
# CONFIG_HP_WATCHDOG is not set
# CONFIG_SC1200_WDT is not set
# CONFIG_PC87413_WDT is not set
# CONFIG_60XX_WDT is not set
# CONFIG_SBC8360_WDT is not set
# CONFIG_CPU5_WDT is not set
# CONFIG_SMSC_SCH311X_WDT is not set
# CONFIG_SMSC37B787_WDT is not set
CONFIG_W83627HF_WDT=m
# CONFIG_W83697HF_WDT is not set
# CONFIG_W83697UG_WDT is not set
CONFIG_W83877F_WDT=m
CONFIG_W83977F_WDT=m
CONFIG_MACHZ_WDT=m
# CONFIG_SBC_EPX_C3_WATCHDOG is not set

#
# PCI-based Watchdog Cards
#
CONFIG_PCIPCWATCHDOG=m
CONFIG_WDTPCI=m
CONFIG_WDT_501_PCI=y

#
# USB-based Watchdog Cards
#
CONFIG_USBPCWATCHDOG=m
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=m
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
CONFIG_SSB_PCMCIAHOST_POSSIBLE=y
# CONFIG_SSB_PCMCIAHOST is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_REGULATOR is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
CONFIG_VIDEO_DEV=m
CONFIG_VIDEO_V4L2_COMMON=m
CONFIG_VIDEO_ALLOW_V4L1=y
CONFIG_VIDEO_V4L1_COMPAT=y
# CONFIG_DVB_CORE is not set
CONFIG_VIDEO_MEDIA=m

#
# Multimedia drivers
#
# CONFIG_MEDIA_ATTACH is not set
CONFIG_MEDIA_TUNER=m
# CONFIG_MEDIA_TUNER_CUSTOMISE is not set
CONFIG_MEDIA_TUNER_SIMPLE=m
CONFIG_MEDIA_TUNER_TDA8290=m
CONFIG_MEDIA_TUNER_TDA9887=m
CONFIG_MEDIA_TUNER_TEA5761=m
CONFIG_MEDIA_TUNER_TEA5767=m
CONFIG_MEDIA_TUNER_MT20XX=m
CONFIG_MEDIA_TUNER_XC2028=m
CONFIG_MEDIA_TUNER_XC5000=m
CONFIG_MEDIA_TUNER_MC44S803=m
CONFIG_VIDEO_V4L2=m
CONFIG_VIDEO_V4L1=m
CONFIG_VIDEOBUF_GEN=m
CONFIG_VIDEOBUF_DMA_SG=m
CONFIG_VIDEOBUF_VMALLOC=m
CONFIG_VIDEO_BTCX=m
CONFIG_VIDEO_IR=m
CONFIG_VIDEO_TVEEPROM=m
CONFIG_VIDEO_TUNER=m
CONFIG_VIDEO_CAPTURE_DRIVERS=y
# CONFIG_VIDEO_ADV_DEBUG is not set
# CONFIG_VIDEO_FIXED_MINOR_RANGES is not set
CONFIG_VIDEO_HELPER_CHIPS_AUTO=y
CONFIG_VIDEO_IR_I2C=m
CONFIG_VIDEO_TVAUDIO=m
CONFIG_VIDEO_TDA7432=m
CONFIG_VIDEO_MSP3400=m
CONFIG_VIDEO_CS53L32A=m
CONFIG_VIDEO_WM8775=m
CONFIG_VIDEO_SAA6588=m
CONFIG_VIDEO_SAA711X=m
CONFIG_VIDEO_TVP5150=m
CONFIG_VIDEO_CX25840=m
CONFIG_VIDEO_CX2341X=m
# CONFIG_VIDEO_VIVI is not set
CONFIG_VIDEO_BT848=m
# CONFIG_VIDEO_BWQCAM is not set
# CONFIG_VIDEO_CQCAM is not set
# CONFIG_VIDEO_W9966 is not set
# CONFIG_VIDEO_CPIA is not set
CONFIG_VIDEO_CPIA2=m
# CONFIG_VIDEO_SAA5246A is not set
# CONFIG_VIDEO_SAA5249 is not set
# CONFIG_VIDEO_STRADIS is not set
# CONFIG_VIDEO_ZORAN is not set
# CONFIG_VIDEO_SAA7134 is not set
# CONFIG_VIDEO_MXB is not set
# CONFIG_VIDEO_HEXIUM_ORION is not set
# CONFIG_VIDEO_HEXIUM_GEMINI is not set
# CONFIG_VIDEO_CX88 is not set
# CONFIG_VIDEO_IVTV is not set
# CONFIG_VIDEO_CAFE_CCIC is not set
# CONFIG_SOC_CAMERA is not set
CONFIG_V4L_USB_DRIVERS=y
# CONFIG_USB_VIDEO_CLASS is not set
CONFIG_USB_VIDEO_CLASS_INPUT_EVDEV=y
CONFIG_USB_GSPCA=m
# CONFIG_USB_M5602 is not set
# CONFIG_USB_STV06XX is not set
# CONFIG_USB_GSPCA_CONEX is not set
# CONFIG_USB_GSPCA_ETOMS is not set
# CONFIG_USB_GSPCA_FINEPIX is not set
# CONFIG_USB_GSPCA_MARS is not set
# CONFIG_USB_GSPCA_MR97310A is not set
# CONFIG_USB_GSPCA_OV519 is not set
# CONFIG_USB_GSPCA_OV534 is not set
# CONFIG_USB_GSPCA_PAC207 is not set
# CONFIG_USB_GSPCA_PAC7311 is not set
# CONFIG_USB_GSPCA_SONIXB is not set
# CONFIG_USB_GSPCA_SONIXJ is not set
# CONFIG_USB_GSPCA_SPCA500 is not set
# CONFIG_USB_GSPCA_SPCA501 is not set
# CONFIG_USB_GSPCA_SPCA505 is not set
# CONFIG_USB_GSPCA_SPCA506 is not set
# CONFIG_USB_GSPCA_SPCA508 is not set
# CONFIG_USB_GSPCA_SPCA561 is not set
# CONFIG_USB_GSPCA_SQ905 is not set
# CONFIG_USB_GSPCA_SQ905C is not set
# CONFIG_USB_GSPCA_STK014 is not set
# CONFIG_USB_GSPCA_SUNPLUS is not set
# CONFIG_USB_GSPCA_T613 is not set
# CONFIG_USB_GSPCA_TV8532 is not set
# CONFIG_USB_GSPCA_VC032X is not set
# CONFIG_USB_GSPCA_ZC3XX is not set
CONFIG_VIDEO_PVRUSB2=m
CONFIG_VIDEO_PVRUSB2_SYSFS=y
# CONFIG_VIDEO_PVRUSB2_DEBUGIFC is not set
# CONFIG_VIDEO_HDPVR is not set
CONFIG_VIDEO_EM28XX=m
# CONFIG_VIDEO_EM28XX_ALSA is not set
# CONFIG_VIDEO_CX231XX is not set
# CONFIG_VIDEO_USBVISION is not set
CONFIG_VIDEO_USBVIDEO=m
CONFIG_USB_VICAM=m
CONFIG_USB_IBMCAM=m
CONFIG_USB_KONICAWC=m
CONFIG_USB_QUICKCAM_MESSENGER=m
CONFIG_USB_ET61X251=m
CONFIG_VIDEO_OVCAMCHIP=m
CONFIG_USB_W9968CF=m
CONFIG_USB_OV511=m
CONFIG_USB_SE401=m
CONFIG_USB_SN9C102=m
CONFIG_USB_STV680=m
CONFIG_USB_ZC0301=m
CONFIG_USB_PWC=m
# CONFIG_USB_PWC_DEBUG is not set
CONFIG_USB_PWC_INPUT_EVDEV=y
# CONFIG_USB_ZR364XX is not set
# CONFIG_USB_STKWEBCAM is not set
# CONFIG_USB_S2255 is not set
CONFIG_RADIO_ADAPTERS=y
# CONFIG_RADIO_GEMTEK_PCI is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_MAESTRO is not set
CONFIG_USB_DSBR=m
# CONFIG_USB_SI470X is not set
# CONFIG_USB_MR800 is not set
# CONFIG_RADIO_TEA5764 is not set
CONFIG_DAB=y
CONFIG_USB_DABUSB=m

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
CONFIG_AGP_SIS=y
CONFIG_AGP_VIA=y
CONFIG_DRM=m
# CONFIG_DRM_TDFX is not set
CONFIG_DRM_R128=m
CONFIG_DRM_RADEON=m
CONFIG_DRM_I810=m
CONFIG_DRM_I830=m
CONFIG_DRM_I915=m
# CONFIG_DRM_I915_KMS is not set
CONFIG_DRM_MGA=m
# CONFIG_DRM_SIS is not set
CONFIG_DRM_VIA=m
CONFIG_DRM_SAVAGE=m
CONFIG_VGASTATE=m
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=m
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
CONFIG_FB_BACKLIGHT=y
CONFIG_FB_MODE_HELPERS=y
CONFIG_FB_TILEBLITTING=y

#
# Frame buffer hardware drivers
#
CONFIG_FB_CIRRUS=m
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
CONFIG_FB_VGA16=m
# CONFIG_FB_UVESA is not set
CONFIG_FB_VESA=y
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
CONFIG_FB_NVIDIA=m
CONFIG_FB_NVIDIA_I2C=y
# CONFIG_FB_NVIDIA_DEBUG is not set
CONFIG_FB_NVIDIA_BACKLIGHT=y
CONFIG_FB_RIVA=m
# CONFIG_FB_RIVA_I2C is not set
# CONFIG_FB_RIVA_DEBUG is not set
CONFIG_FB_RIVA_BACKLIGHT=y
# CONFIG_FB_LE80578 is not set
CONFIG_FB_INTEL=m
# CONFIG_FB_INTEL_DEBUG is not set
CONFIG_FB_INTEL_I2C=y
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
CONFIG_FB_SAVAGE=m
CONFIG_FB_SAVAGE_I2C=y
CONFIG_FB_SAVAGE_ACCEL=y
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
CONFIG_FB_KYRO=m
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
CONFIG_BACKLIGHT_LCD_SUPPORT=y
CONFIG_LCD_CLASS_DEVICE=m
# CONFIG_LCD_ILI9320 is not set
# CONFIG_LCD_PLATFORM is not set
CONFIG_BACKLIGHT_CLASS_DEVICE=y
CONFIG_BACKLIGHT_GENERIC=y
# CONFIG_BACKLIGHT_PROGEAR is not set
# CONFIG_BACKLIGHT_MBP_NVIDIA is not set
# CONFIG_BACKLIGHT_SAHARA is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=64
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY is not set
CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y
# CONFIG_FONTS is not set
CONFIG_FONT_8x8=y
CONFIG_FONT_8x16=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_SOUND=m
CONFIG_SOUND_OSS_CORE=y
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_HWDEP=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_JACK=y
CONFIG_SND_SEQUENCER=m
CONFIG_SND_SEQ_DUMMY=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_DYNAMIC_MINORS=y
# CONFIG_SND_SUPPORT_OLD_API is not set
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set
CONFIG_SND_VMASTER=y
CONFIG_SND_MPU401_UART=m
CONFIG_SND_OPL3_LIB=m
CONFIG_SND_VX_LIB=m
CONFIG_SND_AC97_CODEC=m
CONFIG_SND_DRIVERS=y
CONFIG_SND_DUMMY=m
CONFIG_SND_VIRMIDI=m
CONFIG_SND_MTPAV=m
# CONFIG_SND_MTS64 is not set
# CONFIG_SND_SERIAL_U16550 is not set
CONFIG_SND_MPU401=m
# CONFIG_SND_PORTMAN2X4 is not set
# CONFIG_SND_AC97_POWER_SAVE is not set
CONFIG_SND_SB_COMMON=m
CONFIG_SND_PCI=y
CONFIG_SND_AD1889=m
CONFIG_SND_ALS300=m
CONFIG_SND_ALS4000=m
CONFIG_SND_ALI5451=m
CONFIG_SND_ATIIXP=m
CONFIG_SND_ATIIXP_MODEM=m
CONFIG_SND_AU8810=m
CONFIG_SND_AU8820=m
CONFIG_SND_AU8830=m
# CONFIG_SND_AW2 is not set
CONFIG_SND_AZT3328=m
CONFIG_SND_BT87X=m
# CONFIG_SND_BT87X_OVERCLOCK is not set
CONFIG_SND_CA0106=m
CONFIG_SND_CMIPCI=m
# CONFIG_SND_OXYGEN is not set
CONFIG_SND_CS4281=m
CONFIG_SND_CS46XX=m
CONFIG_SND_CS46XX_NEW_DSP=y
# CONFIG_SND_CS5530 is not set
CONFIG_SND_DARLA20=m
CONFIG_SND_GINA20=m
CONFIG_SND_LAYLA20=m
CONFIG_SND_DARLA24=m
CONFIG_SND_GINA24=m
CONFIG_SND_LAYLA24=m
CONFIG_SND_MONA=m
CONFIG_SND_MIA=m
CONFIG_SND_ECHO3G=m
CONFIG_SND_INDIGO=m
CONFIG_SND_INDIGOIO=m
CONFIG_SND_INDIGODJ=m
# CONFIG_SND_INDIGOIOX is not set
# CONFIG_SND_INDIGODJX is not set
CONFIG_SND_EMU10K1=m
CONFIG_SND_EMU10K1X=m
CONFIG_SND_ENS1370=m
CONFIG_SND_ENS1371=m
CONFIG_SND_ES1938=m
CONFIG_SND_ES1968=m
CONFIG_SND_FM801=m
CONFIG_SND_FM801_TEA575X_BOOL=y
CONFIG_SND_FM801_TEA575X=m
CONFIG_SND_HDA_INTEL=m
# CONFIG_SND_HDA_HWDEP is not set
# CONFIG_SND_HDA_INPUT_BEEP is not set
CONFIG_SND_HDA_CODEC_REALTEK=y
CONFIG_SND_HDA_CODEC_ANALOG=y
CONFIG_SND_HDA_CODEC_SIGMATEL=y
CONFIG_SND_HDA_CODEC_VIA=y
CONFIG_SND_HDA_CODEC_ATIHDMI=y
CONFIG_SND_HDA_CODEC_NVHDMI=y
CONFIG_SND_HDA_CODEC_INTELHDMI=y
CONFIG_SND_HDA_ELD=y
CONFIG_SND_HDA_CODEC_CONEXANT=y
CONFIG_SND_HDA_CODEC_CMEDIA=y
CONFIG_SND_HDA_CODEC_SI3054=y
CONFIG_SND_HDA_GENERIC=y
CONFIG_SND_HDA_POWER_SAVE=y
CONFIG_SND_HDA_POWER_SAVE_DEFAULT=0
CONFIG_SND_HDSP=m
CONFIG_SND_HDSPM=m
# CONFIG_SND_HIFIER is not set
CONFIG_SND_ICE1712=m
CONFIG_SND_ICE1724=m
CONFIG_SND_INTEL8X0=m
CONFIG_SND_INTEL8X0M=m
CONFIG_SND_KORG1212=m
CONFIG_SND_MAESTRO3=m
CONFIG_SND_MIXART=m
CONFIG_SND_NM256=m
CONFIG_SND_PCXHR=m
CONFIG_SND_RIPTIDE=m
CONFIG_SND_RME32=m
CONFIG_SND_RME96=m
CONFIG_SND_RME9652=m
CONFIG_SND_SONICVIBES=m
CONFIG_SND_TRIDENT=m
CONFIG_SND_VIA82XX=m
CONFIG_SND_VIA82XX_MODEM=m
# CONFIG_SND_VIRTUOSO is not set
CONFIG_SND_VX222=m
CONFIG_SND_YMFPCI=m
CONFIG_SND_USB=y
CONFIG_SND_USB_AUDIO=m
CONFIG_SND_USB_USX2Y=m
# CONFIG_SND_USB_CAIAQ is not set
# CONFIG_SND_USB_US122L is not set
CONFIG_SND_PCMCIA=y
# CONFIG_SND_VXPOCKET is not set
# CONFIG_SND_PDAUDIOCF is not set
# CONFIG_SND_SOC is not set
# CONFIG_SOUND_PRIME is not set
CONFIG_AC97_BUS=m
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
CONFIG_HID_DEBUG=y
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
CONFIG_HID_PID=y
CONFIG_USB_HIDDEV=y

#
# Special HID drivers
#
CONFIG_HID_A4TECH=y
CONFIG_HID_APPLE=y
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
CONFIG_HID_CYPRESS=y
# CONFIG_DRAGONRISE_FF is not set
CONFIG_HID_EZKEY=y
CONFIG_HID_KYE=y
CONFIG_HID_GYRATION=y
CONFIG_HID_KENSINGTON=y
CONFIG_HID_LOGITECH=y
CONFIG_LOGITECH_FF=y
# CONFIG_LOGIRUMBLEPAD2_FF is not set
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
CONFIG_HID_NTRIG=y
CONFIG_HID_PANTHERLORD=y
# CONFIG_PANTHERLORD_FF is not set
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
CONFIG_HID_SUNPLUS=y
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_TOPSEED=y
CONFIG_THRUSTMASTER_FF=y
# CONFIG_ZEROPLUS_FF is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
CONFIG_USB_DEVICE_CLASS=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
CONFIG_USB_MON=y
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=m
CONFIG_USB_EHCI_ROOT_HUB_TT=y
CONFIG_USB_EHCI_TT_NEWSCHED=y
# CONFIG_USB_OXU210HP_HCD is not set
CONFIG_USB_ISP116X_HCD=m
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
CONFIG_USB_SL811_HCD=m
CONFIG_USB_SL811_CS=m
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
CONFIG_USB_ACM=m
CONFIG_USB_PRINTER=m
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
CONFIG_USB_STORAGE_DATAFAB=m
CONFIG_USB_STORAGE_FREECOM=m
CONFIG_USB_STORAGE_ISD200=m
CONFIG_USB_STORAGE_USBAT=m
CONFIG_USB_STORAGE_SDDR09=m
CONFIG_USB_STORAGE_SDDR55=m
CONFIG_USB_STORAGE_JUMPSHOT=m
CONFIG_USB_STORAGE_ALAUDA=m
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
CONFIG_USB_MDC800=m
CONFIG_USB_MICROTEK=m

#
# USB port drivers
#
CONFIG_USB_USS720=m
CONFIG_USB_SERIAL=m
CONFIG_USB_EZUSB=y
CONFIG_USB_SERIAL_GENERIC=y
# CONFIG_USB_SERIAL_AIRCABLE is not set
CONFIG_USB_SERIAL_ARK3116=m
CONFIG_USB_SERIAL_BELKIN=m
# CONFIG_USB_SERIAL_CH341 is not set
CONFIG_USB_SERIAL_WHITEHEAT=m
CONFIG_USB_SERIAL_DIGI_ACCELEPORT=m
# CONFIG_USB_SERIAL_CP210X is not set
CONFIG_USB_SERIAL_CYPRESS_M8=m
CONFIG_USB_SERIAL_EMPEG=m
CONFIG_USB_SERIAL_FTDI_SIO=m
CONFIG_USB_SERIAL_FUNSOFT=m
CONFIG_USB_SERIAL_VISOR=m
CONFIG_USB_SERIAL_IPAQ=m
CONFIG_USB_SERIAL_IR=m
CONFIG_USB_SERIAL_EDGEPORT=m
CONFIG_USB_SERIAL_EDGEPORT_TI=m
CONFIG_USB_SERIAL_GARMIN=m
CONFIG_USB_SERIAL_IPW=m
# CONFIG_USB_SERIAL_IUU is not set
CONFIG_USB_SERIAL_KEYSPAN_PDA=m
CONFIG_USB_SERIAL_KEYSPAN=m
CONFIG_USB_SERIAL_KEYSPAN_MPR=y
CONFIG_USB_SERIAL_KEYSPAN_USA28=y
CONFIG_USB_SERIAL_KEYSPAN_USA28X=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XA=y
CONFIG_USB_SERIAL_KEYSPAN_USA28XB=y
CONFIG_USB_SERIAL_KEYSPAN_USA19=y
CONFIG_USB_SERIAL_KEYSPAN_USA18X=y
CONFIG_USB_SERIAL_KEYSPAN_USA19W=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QW=y
CONFIG_USB_SERIAL_KEYSPAN_USA19QI=y
CONFIG_USB_SERIAL_KEYSPAN_USA49W=y
CONFIG_USB_SERIAL_KEYSPAN_USA49WLC=y
CONFIG_USB_SERIAL_KLSI=m
CONFIG_USB_SERIAL_KOBIL_SCT=m
CONFIG_USB_SERIAL_MCT_U232=m
# CONFIG_USB_SERIAL_MOS7720 is not set
# CONFIG_USB_SERIAL_MOS7840 is not set
# CONFIG_USB_SERIAL_MOTOROLA is not set
CONFIG_USB_SERIAL_NAVMAN=m
CONFIG_USB_SERIAL_PL2303=m
# CONFIG_USB_SERIAL_OTI6858 is not set
# CONFIG_USB_SERIAL_QUALCOMM is not set
# CONFIG_USB_SERIAL_SPCP8X5 is not set
CONFIG_USB_SERIAL_HP4X=m
CONFIG_USB_SERIAL_SAFE=m
CONFIG_USB_SERIAL_SAFE_PADDED=y
# CONFIG_USB_SERIAL_SIEMENS_MPI is not set
CONFIG_USB_SERIAL_SIERRAWIRELESS=m
# CONFIG_USB_SERIAL_SYMBOL is not set
CONFIG_USB_SERIAL_TI=m
CONFIG_USB_SERIAL_CYBERJACK=m
CONFIG_USB_SERIAL_XIRCOM=m
CONFIG_USB_SERIAL_OPTION=m
CONFIG_USB_SERIAL_OMNINET=m
# CONFIG_USB_SERIAL_OPTICON is not set
# CONFIG_USB_SERIAL_DEBUG is not set

#
# USB Miscellaneous drivers
#
CONFIG_USB_EMI62=m
CONFIG_USB_EMI26=m
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
CONFIG_USB_RIO500=m
CONFIG_USB_LEGOTOWER=m
CONFIG_USB_LCD=m
# CONFIG_USB_BERRY_CHARGE is not set
CONFIG_USB_LED=m
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
CONFIG_USB_IDMOUSE=m
# CONFIG_USB_FTDI_ELAN is not set
CONFIG_USB_APPLEDISPLAY=m
CONFIG_USB_SISUSBVGA=m
CONFIG_USB_SISUSBVGA_CON=y
CONFIG_USB_LD=m
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
CONFIG_USB_TEST=m
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_VST is not set
CONFIG_USB_ATM=m
CONFIG_USB_SPEEDTOUCH=m
CONFIG_USB_CXACRU=m
CONFIG_USB_UEAGLEATM=m
CONFIG_USB_XUSBATM=m
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
CONFIG_MMC=m
# CONFIG_MMC_DEBUG is not set
# CONFIG_MMC_UNSAFE_RESUME is not set

#
# MMC/SD/SDIO Card Drivers
#
CONFIG_MMC_BLOCK=m
CONFIG_MMC_BLOCK_BOUNCE=y
# CONFIG_SDIO_UART is not set
# CONFIG_MMC_TEST is not set

#
# MMC/SD/SDIO Host Controller Drivers
#
CONFIG_MMC_SDHCI=m
# CONFIG_MMC_SDHCI_PCI is not set
CONFIG_MMC_WBSD=m
# CONFIG_MMC_TIFM_SD is not set
# CONFIG_MMC_SDRICOH_CS is not set
# CONFIG_MEMSTICK is not set
CONFIG_NEW_LEDS=y
CONFIG_LEDS_CLASS=y

#
# LED drivers
#
# CONFIG_LEDS_ALIX2 is not set
# CONFIG_LEDS_PCA9532 is not set
# CONFIG_LEDS_LP5521 is not set
# CONFIG_LEDS_CLEVO_MAIL is not set
# CONFIG_LEDS_PCA955X is not set
# CONFIG_LEDS_BD2802 is not set

#
# LED Triggers
#
CONFIG_LEDS_TRIGGERS=y
CONFIG_LEDS_TRIGGER_TIMER=m
CONFIG_LEDS_TRIGGER_IDE_DISK=y
CONFIG_LEDS_TRIGGER_HEARTBEAT=m
# CONFIG_LEDS_TRIGGER_BACKLIGHT is not set
# CONFIG_LEDS_TRIGGER_DEFAULT_ON is not set

#
# iptables trigger is under Netfilter config (LED target)
#
# CONFIG_ACCESSIBILITY is not set
CONFIG_INFINIBAND=m
CONFIG_INFINIBAND_USER_MAD=m
CONFIG_INFINIBAND_USER_ACCESS=m
CONFIG_INFINIBAND_USER_MEM=y
CONFIG_INFINIBAND_ADDR_TRANS=y
CONFIG_INFINIBAND_MTHCA=m
CONFIG_INFINIBAND_MTHCA_DEBUG=y
CONFIG_INFINIBAND_IPATH=m
CONFIG_INFINIBAND_AMSO1100=m
# CONFIG_INFINIBAND_AMSO1100_DEBUG is not set
CONFIG_INFINIBAND_CXGB3=m
# CONFIG_INFINIBAND_CXGB3_DEBUG is not set
CONFIG_MLX4_INFINIBAND=m
# CONFIG_INFINIBAND_NES is not set
CONFIG_INFINIBAND_IPOIB=m
CONFIG_INFINIBAND_IPOIB_CM=y
CONFIG_INFINIBAND_IPOIB_DEBUG=y
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set
CONFIG_INFINIBAND_SRP=m
CONFIG_INFINIBAND_ISER=m
CONFIG_EDAC=y

#
# Reporting subsystems
#
# CONFIG_EDAC_DEBUG is not set
CONFIG_EDAC_MM_EDAC=m
CONFIG_EDAC_E752X=m
# CONFIG_EDAC_I82975X is not set
# CONFIG_EDAC_I3000 is not set
# CONFIG_EDAC_X38 is not set
# CONFIG_EDAC_I5400 is not set
CONFIG_EDAC_I5000=m
# CONFIG_EDAC_I5100 is not set
# CONFIG_EDAC_AMD8131 is not set
# CONFIG_EDAC_AMD8111 is not set
CONFIG_RTC_LIB=m
CONFIG_RTC_CLASS=m

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
CONFIG_RTC_DRV_DS1307=m
# CONFIG_RTC_DRV_DS1374 is not set
CONFIG_RTC_DRV_DS1672=m
# CONFIG_RTC_DRV_MAX6900 is not set
CONFIG_RTC_DRV_RS5C372=m
CONFIG_RTC_DRV_ISL1208=m
CONFIG_RTC_DRV_X1205=m
CONFIG_RTC_DRV_PCF8563=m
CONFIG_RTC_DRV_PCF8583=m
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
# CONFIG_RTC_DRV_CMOS is not set
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
CONFIG_RTC_DRV_DS1553=m
CONFIG_RTC_DRV_DS1742=m
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
CONFIG_RTC_DRV_V3020=m

#
# on-CPU RTC drivers
#
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ACER_WMI is not set
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_FUJITSU_LAPTOP is not set
# CONFIG_MSI_LAPTOP is not set
# CONFIG_PANASONIC_LAPTOP is not set
# CONFIG_COMPAL_LAPTOP is not set
# CONFIG_SONY_LAPTOP is not set
CONFIG_THINKPAD_ACPI=m
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
# CONFIG_THINKPAD_ACPI_UNSAFE_LEDS is not set
CONFIG_THINKPAD_ACPI_BAY=y
CONFIG_THINKPAD_ACPI_VIDEO=y
CONFIG_THINKPAD_ACPI_HOTKEY_POLL=y
# CONFIG_INTEL_MENLOW is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ACPI_WMI is not set
CONFIG_ACPI_ASUS=m
CONFIG_ACPI_TOSHIBA=m

#
# Firmware Drivers
#
CONFIG_EDD=m
# CONFIG_EDD_OFF is not set
CONFIG_FIRMWARE_MEMMAP=y
CONFIG_DELL_RBU=m
CONFIG_DCDBAS=m
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT2_FS_XIP=y
CONFIG_EXT3_FS=m
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4_FS is not set
CONFIG_FS_XIP=y
CONFIG_JBD=m
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
# CONFIG_XFS_FS is not set
CONFIG_GFS2_FS=m
# CONFIG_GFS2_FS_LOCKING_DLM is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_BTRFS_FS is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
CONFIG_QUOTA=y
# CONFIG_QUOTA_NETLINK_INTERFACE is not set
CONFIG_PRINT_QUOTA_WARNING=y
CONFIG_QUOTA_TREE=y
# CONFIG_QFMT_V1 is not set
CONFIG_QFMT_V2=y
CONFIG_QUOTACTL=y
# CONFIG_AUTOFS_FS is not set
CONFIG_AUTOFS4_FS=m
# CONFIG_FUSE_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_UDF_FS=m
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="ascii"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_CONFIGFS_FS=m
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
CONFIG_ECRYPT_FS=m
CONFIG_HFS_FS=m
CONFIG_HFSPLUS_FS=m
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
CONFIG_JFFS2_FS=m
CONFIG_JFFS2_FS_DEBUG=0
CONFIG_JFFS2_FS_WRITEBUFFER=y
# CONFIG_JFFS2_FS_WBUF_VERIFY is not set
CONFIG_JFFS2_SUMMARY=y
# CONFIG_JFFS2_FS_XATTR is not set
# CONFIG_JFFS2_COMPRESSION_OPTIONS is not set
CONFIG_JFFS2_ZLIB=y
# CONFIG_JFFS2_LZO is not set
CONFIG_JFFS2_RTIME=y
# CONFIG_JFFS2_RUBIN is not set
CONFIG_CRAMFS=m
# CONFIG_SQUASHFS is not set
CONFIG_VXFS_FS=m
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_NILFS2_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=m
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
CONFIG_NFS_V4=y
CONFIG_NFSD=m
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
CONFIG_NFSD_V4=y
CONFIG_LOCKD=m
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=m
CONFIG_NFS_ACL_SUPPORT=m
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=m
CONFIG_SUNRPC_GSS=m
CONFIG_SUNRPC_XPRT_RDMA=m
CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_RPCSEC_GSS_SPKM3=m
# CONFIG_SMB_FS is not set
CONFIG_CIFS=m
# CONFIG_CIFS_STATS is not set
CONFIG_CIFS_WEAK_PW_HASH=y
CONFIG_CIFS_UPCALL=y
CONFIG_CIFS_XATTR=y
CONFIG_CIFS_POSIX=y
# CONFIG_CIFS_DEBUG2 is not set
# CONFIG_CIFS_DFS_UPCALL is not set
CONFIG_CIFS_EXPERIMENTAL=y
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
CONFIG_OSF_PARTITION=y
CONFIG_AMIGA_PARTITION=y
# CONFIG_ATARI_PARTITION is not set
CONFIG_MAC_PARTITION=y
CONFIG_MSDOS_PARTITION=y
CONFIG_BSD_DISKLABEL=y
CONFIG_MINIX_SUBPARTITION=y
CONFIG_SOLARIS_X86_PARTITION=y
CONFIG_UNIXWARE_DISKLABEL=y
# CONFIG_LDM_PARTITION is not set
CONFIG_SGI_PARTITION=y
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
CONFIG_KARMA_PARTITION=y
CONFIG_EFI_PARTITION=y
# CONFIG_SYSV68_PARTITION is not set
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="utf8"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=m
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_UTF8=m
CONFIG_DLM=m
CONFIG_DLM_DEBUG=y

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
# CONFIG_FRAME_POINTER is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_LKDTM is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
# CONFIG_SYSCTL_SYSCALL_CHECK is not set
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_RING_BUFFER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y

#
# Tracers
#
# CONFIG_FUNCTION_TRACER is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SYSPROF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_CONTEXT_SWITCH_TRACER is not set
# CONFIG_EVENT_TRACER is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_BOOT_TRACER is not set
# CONFIG_TRACE_BRANCH_PROFILING is not set
# CONFIG_POWER_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_KMEMTRACE is not set
# CONFIG_WORKQUEUE_TRACER is not set
CONFIG_BLK_DEV_IO_TRACE=y
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_FIREWIRE_OHCI_REMOTE_DMA is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
CONFIG_SAMPLES=y
CONFIG_SAMPLE_MARKERS=m
# CONFIG_SAMPLE_TRACEPOINTS is not set
# CONFIG_SAMPLE_KOBJECT is not set
# CONFIG_SAMPLE_KPROBES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
CONFIG_DEBUG_RODATA=y
CONFIG_DEBUG_RODATA_TEST=y
# CONFIG_DEBUG_NX_TEST is not set
# CONFIG_IOMMU_DEBUG is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
CONFIG_KEYS=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
CONFIG_SECURITY=y
CONFIG_SECURITYFS=y
CONFIG_SECURITY_NETWORK=y
CONFIG_SECURITY_NETWORK_XFRM=y
# CONFIG_SECURITY_PATH is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_SECURITY_ROOTPLUG is not set
CONFIG_SECURITY_DEFAULT_MMAP_MIN_ADDR=0
CONFIG_SECURITY_SELINUX=y
CONFIG_SECURITY_SELINUX_BOOTPARAM=y
CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE=1
CONFIG_SECURITY_SELINUX_DISABLE=y
CONFIG_SECURITY_SELINUX_DEVELOP=y
CONFIG_SECURITY_SELINUX_AVC_STATS=y
CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
# CONFIG_SECURITY_SELINUX_POLICYDB_VERSION_MAX is not set
# CONFIG_SECURITY_SMACK is not set
# CONFIG_SECURITY_TOMOYO is not set
# CONFIG_IMA is not set
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_GF128MUL is not set
CONFIG_CRYPTO_NULL=m
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
CONFIG_CRYPTO_TEST=m

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
CONFIG_CRYPTO_ECB=m
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
CONFIG_CRYPTO_XCBC=m

#
# Digest
#
CONFIG_CRYPTO_CRC32C=y
# CONFIG_CRYPTO_CRC32C_INTEL is not set
CONFIG_CRYPTO_MD4=m
CONFIG_CRYPTO_MD5=m
CONFIG_CRYPTO_MICHAEL_MIC=m
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
CONFIG_CRYPTO_SHA256=m
CONFIG_CRYPTO_SHA512=m
CONFIG_CRYPTO_TGR192=m
CONFIG_CRYPTO_WP512=m

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
CONFIG_CRYPTO_AES_X86_64=m
# CONFIG_CRYPTO_AES_NI_INTEL is not set
CONFIG_CRYPTO_ANUBIS=m
CONFIG_CRYPTO_ARC4=m
CONFIG_CRYPTO_BLOWFISH=m
# CONFIG_CRYPTO_CAMELLIA is not set
CONFIG_CRYPTO_CAST5=m
CONFIG_CRYPTO_CAST6=m
CONFIG_CRYPTO_DES=m
# CONFIG_CRYPTO_FCRYPT is not set
CONFIG_CRYPTO_KHAZAD=m
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
CONFIG_CRYPTO_SERPENT=m
CONFIG_CRYPTO_TEA=m
CONFIG_CRYPTO_TWOFISH=m
CONFIG_CRYPTO_TWOFISH_COMMON=m
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=m
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_HIFN_795X is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_VIRTUALIZATION=y
# CONFIG_KVM is not set
# CONFIG_VIRTIO_PCI is not set
# CONFIG_VIRTIO_BALLOON is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
CONFIG_CRC_CCITT=m
CONFIG_CRC16=m
# CONFIG_CRC_T10DIF is not set
CONFIG_CRC_ITU_T=m
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=m
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_GENERIC_ALLOCATOR=y
CONFIG_REED_SOLOMON=m
CONFIG_REED_SOLOMON_DEC16=y
CONFIG_TEXTSEARCH=y
CONFIG_TEXTSEARCH_KMP=m
CONFIG_TEXTSEARCH_BM=m
CONFIG_TEXTSEARCH_FSM=m
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-05-06 18:05     ` Styner, Douglas W
@ 2009-05-06 18:12       ` Wilcox, Matthew R
  2009-05-06 18:24         ` Anirban Chakraborty
  0 siblings, 1 reply; 122+ messages in thread
From: Wilcox, Matthew R @ 2009-05-06 18:12 UTC (permalink / raw)
  To: Styner, Douglas W, Anirban Chakraborty, linux-kernel
  Cc: Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B, Ma,
	Chinang, Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F,
	Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, chris.mason

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2086 bytes --]

That's a more accurate simulation of our workload, but Anirban's setup doesn't have nearly as many spindles as ours, so he won't do as many IOPS and may not see the problem.

All I'm trying to do is get something that will show the problem on his setup, and I think sequential IO is going to be the right answer here.  I could easily be wrong.

Neither FIO nor dd is going to have the cache behaviour of the database (maybe Orion does?)  As far as I can tell, we come to the kernel cache-cold for every IO simply because the database uses as many cache entries as it can.  We could write a little program to just thrash through cachelines, or just run gcc at the same time as this -- apparently gcc will happily chew through all the cache it can too.

> -----Original Message-----
> From: Styner, Douglas W
> Sent: Wednesday, May 06, 2009 11:05 AM
> To: Wilcox, Matthew R; Anirban Chakraborty; linux-kernel@vger.kernel.org
> Cc: Tripathi, Sharad C; arjan@linux.intel.com; Kleen, Andi; Siddha, Suresh
> B; Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F;
> Nelson, Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan,
> Rajalakshmi; Garg, Anil K; Chilukuri, Harita; chris.mason@oracle.com
> Subject: RE: Mainline kernel OLTP performance update
> 
> Wilcox, Matthew R writes:
> >I'm not sure that Orion is going to give useful results in your hardware
> >setup.  I suspect you don't have enough spindles to get the IO rates that
> >are required to see the problem.  How about doing lots of contiguous I/O
> >instead?  Something as simple as:
> >
> >for i in sda sdb sdc (repeat ad nauseam); do \
> >	dd if=/dev/$i of=/dev/null bs=4k iflag=direct & \
> >done
> >
> 
> A better workload emulator would be to use FIO to generate ~60%/40%
> reads/writes with ~90-95% random i/o using 2k blksize.  There is some
> sequential writing in our workload but only to a log file and there is not
> much activity there.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-05-06 15:53   ` Wilcox, Matthew R
@ 2009-05-06 18:05     ` Styner, Douglas W
  2009-05-06 18:12       ` Wilcox, Matthew R
  0 siblings, 1 reply; 122+ messages in thread
From: Styner, Douglas W @ 2009-05-06 18:05 UTC (permalink / raw)
  To: Wilcox, Matthew R, Anirban Chakraborty, linux-kernel
  Cc: Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B, Ma,
	Chinang, Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F,
	Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, chris.mason

Wilcox, Matthew R writes:
>I'm not sure that Orion is going to give useful results in your hardware
>setup.  I suspect you don't have enough spindles to get the IO rates that
>are required to see the problem.  How about doing lots of contiguous I/O
>instead?  Something as simple as:
>
>for i in sda sdb sdc (repeat ad nauseam); do \
>	dd if=/dev/$i of=/dev/null bs=4k iflag=direct & \
>done
>

A better workload emulator would be to use FIO to generate ~60%/40% reads/writes with ~90-95% random i/o using 2k blksize.  There is some sequential writing in our workload but only to a log file and there is not much activity there.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-05-06  6:29 ` Anirban Chakraborty
@ 2009-05-06 15:53   ` Wilcox, Matthew R
  2009-05-06 18:05     ` Styner, Douglas W
  2009-05-06 18:19   ` Styner, Douglas W
  1 sibling, 1 reply; 122+ messages in thread
From: Wilcox, Matthew R @ 2009-05-06 15:53 UTC (permalink / raw)
  To: Anirban Chakraborty, Styner, Douglas W, linux-kernel
  Cc: Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B, Ma,
	Chinang, Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F,
	Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, chris.mason

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6991 bytes --]

I'm not sure that Orion is going to give useful results in your hardware setup.  I suspect you don't have enough spindles to get the IO rates that are required to see the problem.  How about doing lots of contiguous I/O instead?  Something as simple as:

for i in sda sdb sdc (repeat ad nauseam); do \
	dd if=/dev/$i of=/dev/null bs=4k iflag=direct & \
done

might be enough to get I/O rates high enough to see problems in the interrupt handler.

> -----Original Message-----
> From: Anirban Chakraborty [mailto:anirban.chakraborty@qlogic.com]
> Sent: Tuesday, May 05, 2009 11:30 PM
> To: Styner, Douglas W; linux-kernel@vger.kernel.org
> Cc: Tripathi, Sharad C; arjan@linux.intel.com; Wilcox, Matthew R; Kleen,
> Andi; Siddha, Suresh B; Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert;
> Recalde, Luis F; Nelson, Doug; Cheng, Wu-sun; Prickett, Terry O;
> Shunmuganathan, Rajalakshmi; Garg, Anil K; Chilukuri, Harita;
> chris.mason@oracle.com
> Subject: Re: Mainline kernel OLTP performance update
> 
> 
> 
> 
> On 5/4/09 8:54 AM, "Styner, Douglas W" <douglas.w.styner@intel.com> wrote:
> 
> > <this time with subject line>
> > Summary: Measured the mainline kernel from kernel.org (2.6.30-rc4).
> >
> > The regression for 2.6.30-rc4 against the baseline, 2.6.24.2 is 2.15%
> > (2.6.30-rc3 regression was 1.91%).  Oprofile reports 70.1204% user,
> 29.874%
> > system.
> >
> > Linux OLTP Performance summary
> > Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
> > iowait%
> > 2.6.24.2                1.000   22106   43709   75      24      0
> 0
> > 2.6.30-rc4              0.978   30581   43034   75      25      0
> 0
> >
> > Server configurations:
> > Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
> > 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
> >
> >
> > ======oprofile CPU_CLK_UNHALTED for top 30 functions
> > Cycles% 2.6.24.2                   Cycles% 2.6.30-rc4
> > 74.8578 <database>                 67.8732 <database>
> > 1.0500 qla24xx_start_scsi          1.1162 qla24xx_start_scsi
> > 0.8089 schedule                    0.9888 qla24xx_intr_handler
> > 0.5864 kmem_cache_alloc            0.8776 __schedule
> > 0.4989 __blockdev_direct_IO        0.7401 kmem_cache_alloc
> > 0.4357 __sigsetjmp                 0.4914 read_hpet
> > 0.4152 copy_user_generic_string    0.4792 __sigsetjmp
> > 0.3953 qla24xx_intr_handler        0.4368 __blockdev_direct_IO
> > 0.3850 memcpy                      0.3822 task_rq_lock
> > 0.3596 scsi_request_fn             0.3781 __switch_to
> > 0.3188 __switch_to                 0.3620 __list_add
> > 0.2889 lock_timer_base             0.3377 rb_get_reader_page
> > 0.2750 memmove                     0.3336 copy_user_generic_string
> > 0.2519 task_rq_lock                0.3195 try_to_wake_up
> > 0.2474 aio_complete                0.3114 scsi_request_fn
> > 0.2460 scsi_alloc_sgtable          0.3114 ring_buffer_consume
> > 0.2445 generic_make_request        0.2932 aio_complete
> > 0.2263 qla2x00_process_completed_re0.2730 lock_timer_base
> > 0.2118 blk_queue_end_tag           0.2588 memset_c
> > 0.2085 dio_bio_complete            0.2588 mod_timer
> > 0.2021 e1000_xmit_frame            0.2447 generic_make_request
> > 0.2006 __end_that_request_first    0.2426 qla2x00_process_completed_re
> > 0.1954 generic_file_aio_read       0.2265 tcp_sendmsg
> > 0.1949 kfree                       0.2184 memmove
> > 0.1915 tcp_sendmsg                 0.2184 kfree
> > 0.1901 try_to_wake_up              0.2103 scsi_device_unbusy
> > 0.1895 kref_get                    0.2083 mempool_free
> > 0.1864 __mod_timer                 0.1961 blk_queue_end_tag
> > 0.1863 thread_return               0.1941 kmem_cache_free
> > 0.1854 math_state_restore          0.1921 kref_get
> 
> I tried to replicate the scenario. I have used Orion (a database load
> generator from Oracle) with following settings. The results do not show
> significant difference in cycles.
> 
> Setup:
> Xeon Quad core (7350), 4 sockets with 16GB memory, 1 qle2462 directly
> connected to SanBlaze target with 255 luns.
> 
> ORION VERSION 11.1.0.7.0
> -run advanced -testname test -num_disks 255 -num_streamIO 16 -write 100
> -type seq -matrix point -size_large 1 -num_small 0 -num_large 16 -simulate
> raid0 -cache_size 0
> 
> CPU: Core 2, speed 2933.45 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
> mask of 0x00 (Unhalted core cycles) count 80000
> Counted L2_RQSTS events (number of L2 cache requests) with a unit mask of
> 0x41 (multiple flags) count 6000
> 
> 2.6.30-rc4                      2.6.24.7
> 12.4062 tg_shares_up         11.4415 tg_shares_up
> 6.6774 cache_free_debugcheck  6.3950 check_poison_obj
> 5.2861 kernel_text_address    6.1896 pick_next_task_fair
> 4.2201 kernel_map_pages       4.4998 mwait_idle
> 3.9626 __module_address       3.1111 dequeue_entity
> 3.7923 _raw_spin_lock         2.8842 mwait_idle
> 3.1965  kmem_cache_free       2.2679 find_busiest_group
> 3.1494 __module_text_address  1.7949 _raw_spin_lock
> 2.5449 find_busiest_group     1.7488 qla24xx_start_scsi
> 2.4670 mwait_idle             1.5948 find_next_bit
> 2.2321 qla24xx_start_scsi     1.5433 memset_c
> 2.1065 kernel_map_pages       1.5265 find_busiest_group
> 1.9261 is_module_text_address 1.4750 compat_blkdev_ioctl
> 1.5905 _raw_spin_lock         1.1865 _raw_spin_lock
> 1.5206 find_next_bit          1.0938 qla24xx_intr_handler
> 1.2963  cache_alloc_debugcheck_after 0.9805 cache_free_debugcheck
> 1.2785 memset.c               0.9306 kernel_map_pages
> 0.9918 __aio_put_req          0.9104 kmem_cache_free
> 0.9916 check_poison_obj       0.9085 __setscheduler
> 0.9413 qla24xx_intr_handler   0.8982 sched_rt_handler
> 0.9081 kmem_cache_alloc       0.8847 kernel_text_address
> 0.7647 cache_flusharray       0.8634 run_rebalance_domains
> 0.7213 trace_hardirqs_off     0.8041 _raw_spin_lock
> 0.6836 __change_page_attr_set_clr 0.7301 cache_alloc_debugcheck_after
> 0.6450 aio_complete           0.6905 __module_address
> 0.6365 qla2x00_process_completed_request 0.6630 kmem_cache_alloc
> 0.6330 delay_tsc              0.6240 memset_c
> 0.6248 blk_queue_end_tag      0.5501 rwbase_run_test
> 0.5568 delay_tsc              0.5146 __module_text_address
> 0.5279 trace_hardirqs_off     0.5064 apic_timer_interrupt
> 0.5215 scsi_softirq_done      0.4919 cache_free_debugcheck
> 
> However, I do notice that profiling report generated is not consistent all
> the time. Not sure, if I am missing something in my setup. Sometimes, I do
> see following type of error messages popping up while running opreport.
> warning: [vdso] (tgid:30873 range:0x7fff6a9fe000-0x7fff6a9ff000) could not
> be found.
> 
> I was wondering if your kernel config is quite different from mine. I have
> attached my kernel config file.
> 
> Thanks,
> Anirban
> 

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-05-04 15:54 Styner, Douglas W
@ 2009-05-06  6:29 ` Anirban Chakraborty
  2009-05-06 15:53   ` Wilcox, Matthew R
  2009-05-06 18:19   ` Styner, Douglas W
  0 siblings, 2 replies; 122+ messages in thread
From: Anirban Chakraborty @ 2009-05-06  6:29 UTC (permalink / raw)
  To: Styner, Douglas W, linux-kernel
  Cc: Tripathi, Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi,
	Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong, Nueckel,
	Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun, Prickett,
	Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K, Chilukuri,
	Harita, chris.mason

[-- Attachment #1: Type: text/plain, Size: 5722 bytes --]




On 5/4/09 8:54 AM, "Styner, Douglas W" <douglas.w.styner@intel.com> wrote:

> <this time with subject line>
> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc4).
> 
> The regression for 2.6.30-rc4 against the baseline, 2.6.24.2 is 2.15%
> (2.6.30-rc3 regression was 1.91%).  Oprofile reports 70.1204% user, 29.874%
> system.
> 
> Linux OLTP Performance summary
> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
> iowait%
> 2.6.24.2                1.000   22106   43709   75      24      0       0
> 2.6.30-rc4              0.978   30581   43034   75      25      0       0
> 
> Server configurations:
> Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
> 
> 
> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.24.2                   Cycles% 2.6.30-rc4
> 74.8578 <database>                 67.8732 <database>
> 1.0500 qla24xx_start_scsi          1.1162 qla24xx_start_scsi
> 0.8089 schedule                    0.9888 qla24xx_intr_handler
> 0.5864 kmem_cache_alloc            0.8776 __schedule
> 0.4989 __blockdev_direct_IO        0.7401 kmem_cache_alloc
> 0.4357 __sigsetjmp                 0.4914 read_hpet
> 0.4152 copy_user_generic_string    0.4792 __sigsetjmp
> 0.3953 qla24xx_intr_handler        0.4368 __blockdev_direct_IO
> 0.3850 memcpy                      0.3822 task_rq_lock
> 0.3596 scsi_request_fn             0.3781 __switch_to
> 0.3188 __switch_to                 0.3620 __list_add
> 0.2889 lock_timer_base             0.3377 rb_get_reader_page
> 0.2750 memmove                     0.3336 copy_user_generic_string
> 0.2519 task_rq_lock                0.3195 try_to_wake_up
> 0.2474 aio_complete                0.3114 scsi_request_fn
> 0.2460 scsi_alloc_sgtable          0.3114 ring_buffer_consume
> 0.2445 generic_make_request        0.2932 aio_complete
> 0.2263 qla2x00_process_completed_re0.2730 lock_timer_base
> 0.2118 blk_queue_end_tag           0.2588 memset_c
> 0.2085 dio_bio_complete            0.2588 mod_timer
> 0.2021 e1000_xmit_frame            0.2447 generic_make_request
> 0.2006 __end_that_request_first    0.2426 qla2x00_process_completed_re
> 0.1954 generic_file_aio_read       0.2265 tcp_sendmsg
> 0.1949 kfree                       0.2184 memmove
> 0.1915 tcp_sendmsg                 0.2184 kfree
> 0.1901 try_to_wake_up              0.2103 scsi_device_unbusy
> 0.1895 kref_get                    0.2083 mempool_free
> 0.1864 __mod_timer                 0.1961 blk_queue_end_tag
> 0.1863 thread_return               0.1941 kmem_cache_free
> 0.1854 math_state_restore          0.1921 kref_get

I tried to replicate the scenario. I have used Orion (a database load
generator from Oracle) with following settings. The results do not show
significant difference in cycles.

Setup:
Xeon Quad core (7350), 4 sockets with 16GB memory, 1 qle2462 directly
connected to SanBlaze target with 255 luns.

ORION VERSION 11.1.0.7.0
-run advanced -testname test -num_disks 255 -num_streamIO 16 -write 100
-type seq -matrix point -size_large 1 -num_small 0 -num_large 16 -simulate
raid0 -cache_size 0
 
CPU: Core 2, speed 2933.45 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 80000
Counted L2_RQSTS events (number of L2 cache requests) with a unit mask of
0x41 (multiple flags) count 6000

2.6.30-rc4                      2.6.24.7
12.4062 tg_shares_up         11.4415 tg_shares_up
6.6774 cache_free_debugcheck  6.3950 check_poison_obj
5.2861 kernel_text_address    6.1896 pick_next_task_fair
4.2201 kernel_map_pages       4.4998 mwait_idle
3.9626 __module_address       3.1111 dequeue_entity
3.7923 _raw_spin_lock         2.8842 mwait_idle
3.1965  kmem_cache_free       2.2679 find_busiest_group
3.1494 __module_text_address  1.7949 _raw_spin_lock
2.5449 find_busiest_group     1.7488 qla24xx_start_scsi
2.4670 mwait_idle             1.5948 find_next_bit
2.2321 qla24xx_start_scsi     1.5433 memset_c
2.1065 kernel_map_pages       1.5265 find_busiest_group
1.9261 is_module_text_address 1.4750 compat_blkdev_ioctl
1.5905 _raw_spin_lock         1.1865 _raw_spin_lock
1.5206 find_next_bit          1.0938 qla24xx_intr_handler
1.2963  cache_alloc_debugcheck_after 0.9805 cache_free_debugcheck
1.2785 memset.c               0.9306 kernel_map_pages
0.9918 __aio_put_req          0.9104 kmem_cache_free
0.9916 check_poison_obj       0.9085 __setscheduler
0.9413 qla24xx_intr_handler   0.8982 sched_rt_handler
0.9081 kmem_cache_alloc       0.8847 kernel_text_address
0.7647 cache_flusharray       0.8634 run_rebalance_domains
0.7213 trace_hardirqs_off     0.8041 _raw_spin_lock
0.6836 __change_page_attr_set_clr 0.7301 cache_alloc_debugcheck_after
0.6450 aio_complete           0.6905 __module_address
0.6365 qla2x00_process_completed_request 0.6630 kmem_cache_alloc
0.6330 delay_tsc              0.6240 memset_c
0.6248 blk_queue_end_tag      0.5501 rwbase_run_test
0.5568 delay_tsc              0.5146 __module_text_address
0.5279 trace_hardirqs_off     0.5064 apic_timer_interrupt
0.5215 scsi_softirq_done      0.4919 cache_free_debugcheck

However, I do notice that profiling report generated is not consistent all
the time. Not sure, if I am missing something in my setup. Sometimes, I do
see following type of error messages popping up while running opreport.
warning: [vdso] (tgid:30873 range:0x7fff6a9fe000-0x7fff6a9ff000) could not
be found.

I was wondering if your kernel config is quite different from mine. I have
attached my kernel config file.

Thanks,
Anirban



[-- Attachment #2: config --]
[-- Type: application/octet-stream, Size: 34560 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30-rc4
# Mon May  4 17:36:35 2009
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
CONFIG_CLASSIC_RCU=y
# CONFIG_TREE_RCU is not set
# CONFIG_PREEMPT_RCU is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=15
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_RT_GROUP_SCHED is not set
CONFIG_USER_SCHED=y
# CONFIG_CGROUP_SCHED is not set
# CONFIG_CGROUPS is not set
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
# CONFIG_RELAY is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
CONFIG_COMPAT_BRK=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_MARKERS=y
CONFIG_OPROFILE=y
# CONFIG_OPROFILE_IBS is not set
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_API_DEBUG=y
# CONFIG_SLOW_WORK is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_MODVERSIONS=y
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
CONFIG_BLK_DEV_BSG=y
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
CONFIG_DEFAULT_AS=y
# CONFIG_DEFAULT_DEADLINE is not set
# CONFIG_DEFAULT_CFQ is not set
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="anticipatory"
# CONFIG_FREEZER is not set

#
# Processor type and features
#
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
# CONFIG_SPARSE_IRQ is not set
CONFIG_X86_MPPARSE=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
# CONFIG_MEMTEST is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
CONFIG_MPSC=y
# CONFIG_MCORE2 is not set
# CONFIG_GENERIC_CPU is not set
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_INTERNODE_CACHE_BYTES=128
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_P6_NOP=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
# CONFIG_X86_DS is not set
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
# CONFIG_AMD_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
# CONFIG_IOMMU_API is not set
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=16
# CONFIG_SCHED_SMT is not set
# CONFIG_SCHED_MC is not set
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
# CONFIG_X86_MCE_AMD is not set
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
# CONFIG_X86_CPU_DEBUG is not set
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
# CONFIG_NUMA is not set
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
# CONFIG_MEMORY_HOTPLUG is not set
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
# CONFIG_X86_RESERVE_LOW_64K is not set
CONFIG_MTRR=y
# CONFIG_MTRR_SANITIZER is not set
# CONFIG_X86_PAT is not set
# CONFIG_EFI is not set
# CONFIG_SECCOMP is not set
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_SCHED_HRTICK is not set
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_PHYSICAL_START=0x200000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
# CONFIG_COMPAT_VDSO is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y

#
# Power management and ACPI options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_SUSPEND is not set
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
# CONFIG_ACPI_PROCFS is not set
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_SYSFS_POWER=y
# CONFIG_ACPI_PROC_EVENT is not set
# CONFIG_ACPI_AC is not set
# CONFIG_ACPI_BATTERY is not set
# CONFIG_ACPI_BUTTON is not set
# CONFIG_ACPI_FAN is not set
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
# CONFIG_ACPI_THERMAL is not set
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
# CONFIG_ACPI_SBS is not set

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_DMAR is not set
# CONFIG_INTR_REMAP is not set
CONFIG_PCIEPORTBUS=y
# CONFIG_HOTPLUG_PCI_PCIE is not set
CONFIG_PCIEAER=y
CONFIG_PCIEASPM=y
# CONFIG_PCIEASPM_DEBUG is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LEGACY=y
CONFIG_PCI_DEBUG=y
# CONFIG_PCI_STUB is not set
CONFIG_HT_IRQ=y
# CONFIG_PCI_IOV is not set
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
# CONFIG_PCCARD is not set
CONFIG_HOTPLUG_PCI=y
# CONFIG_HOTPLUG_PCI_FAKE is not set
# CONFIG_HOTPLUG_PCI_ACPI is not set
# CONFIG_HOTPLUG_PCI_CPCI is not set
# CONFIG_HOTPLUG_PCI_SHPC is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_IA32_EMULATION=y
# CONFIG_IA32_AOUT is not set
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_INET_LRO is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
# CONFIG_NET_SCHED is not set
# CONFIG_DCB is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_CONNECTOR is not set
# CONFIG_MTD is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
# CONFIG_PNP_DEBUG_MESSAGES is not set

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_CPQ_DA is not set
CONFIG_BLK_CPQ_CISS_DA=y
# CONFIG_CISS_SCSI_TAPE is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
# CONFIG_BLK_DEV_XIP is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_93CX6 is not set
CONFIG_HAVE_IDE=y
CONFIG_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_XFER_MODE=y
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_IDE_GD=y
CONFIG_IDE_GD_ATA=y
# CONFIG_IDE_GD_ATAPI is not set
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEACPI is not set
# CONFIG_IDE_TASK_IOCTL is not set
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_PLATFORM is not set
CONFIG_BLK_DEV_CMD640=y
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
# CONFIG_BLK_DEV_IDEPNP is not set
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
CONFIG_BLK_DEV_RZ1000=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_JMICRON is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT8172 is not set
# CONFIG_BLK_DEV_IT8213 is not set
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
CONFIG_BLK_DEV_SVWKS=y
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_BLK_DEV_TC86C001 is not set
CONFIG_BLK_DEV_IDEDMA=y

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
# CONFIG_CHR_DEV_OSST is not set
# CONFIG_BLK_DEV_SR is not set
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
CONFIG_SCSI_LOGGING=y
CONFIG_SCSI_SCAN_ASYNC=y
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
# CONFIG_SCSI_SAS_ATTRS is not set
# CONFIG_SCSI_SAS_LIBSAS is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
CONFIG_SCSI_QLA_FC=y
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
# CONFIG_ATA is not set
# CONFIG_MD is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
# CONFIG_FIREWIRE is not set
# CONFIG_IEEE1394 is not set
# CONFIG_I2O is not set
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_COMPAT_NET_DEV_OPS=y
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_VETH is not set
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
# CONFIG_NET_ETHERNET is not set
CONFIG_NETDEV_1000=y
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_E1000E is not set
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_VIA_VELOCITY is not set
# CONFIG_TIGON3 is not set
CONFIG_BNX2=y
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_JME is not set
# CONFIG_NETDEV_10000 is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
CONFIG_NETCONSOLE=m
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_POLLDEV is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
# CONFIG_DEVKMEM is not set
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_IPMI_HANDLER is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_NVRAM is not set
CONFIG_RTC=y
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
# CONFIG_PC8736x_GPIO is not set
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=256
# CONFIG_HPET is not set
# CONFIG_HANGCHECK_TIMER is not set
# CONFIG_TCG_TPM is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
# CONFIG_I2C is not set
# CONFIG_SPI is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
# CONFIG_HWMON is not set
CONFIG_THERMAL=y
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_REGULATOR is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
# CONFIG_VIDEO_DEV is not set
# CONFIG_DVB_CORE is not set
# CONFIG_VIDEO_MEDIA is not set

#
# Multimedia drivers
#
# CONFIG_DAB is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
# CONFIG_AGP_INTEL is not set
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_VIA is not set
# CONFIG_DRM is not set
# CONFIG_VGASTATE is not set
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
# CONFIG_FB is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
# CONFIG_VGACON_SOFT_SCROLLBACK is not set
CONFIG_DUMMY_CONSOLE=y
# CONFIG_SOUND is not set
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
# CONFIG_HID_DEBUG is not set
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=y
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set

#
# Special HID drivers
#
CONFIG_HID_A4TECH=y
CONFIG_HID_APPLE=y
CONFIG_HID_BELKIN=y
CONFIG_HID_CHERRY=y
CONFIG_HID_CHICONY=y
CONFIG_HID_CYPRESS=y
# CONFIG_DRAGONRISE_FF is not set
CONFIG_HID_EZKEY=y
CONFIG_HID_KYE=y
CONFIG_HID_GYRATION=y
CONFIG_HID_KENSINGTON=y
CONFIG_HID_LOGITECH=y
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
CONFIG_HID_MICROSOFT=y
CONFIG_HID_MONTEREY=y
CONFIG_HID_NTRIG=y
CONFIG_HID_PANTHERLORD=y
# CONFIG_PANTHERLORD_FF is not set
CONFIG_HID_PETALYNX=y
CONFIG_HID_SAMSUNG=y
CONFIG_HID_SONY=y
CONFIG_HID_SUNPLUS=y
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_TOPSEED=y
# CONFIG_THRUSTMASTER_FF is not set
# CONFIG_ZEROPLUS_FF is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_DEVICE_CLASS is not set
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
# CONFIG_USB_MON is not set
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
# CONFIG_USB_OHCI_HCD is not set
CONFIG_USB_UHCI_HCD=y
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_BERRY_CHARGE is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_TEST is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_VST is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
# CONFIG_EDAC is not set
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ACPI_WMI is not set
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
# CONFIG_DMIID is not set
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
CONFIG_EXT2_FS_XIP=y
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y
# CONFIG_EXT4_FS is not set
CONFIG_FS_XIP=y
CONFIG_JBD=y
CONFIG_JBD_DEBUG=y
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_REISERFS_FS_XATTR is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_BTRFS_FS is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
# CONFIG_CONFIGFS_FS is not set
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_PRINTK_TIME=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
# CONFIG_SCHED_DEBUG is not set
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_DEBUG_SLAB=y
# CONFIG_DEBUG_SLAB_LEAK is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_TRACER_MAX_TRACE=y
CONFIG_RING_BUFFER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y

#
# Tracers
#
# CONFIG_FUNCTION_TRACER is not set
CONFIG_IRQSOFF_TRACER=y
CONFIG_SYSPROF_TRACER=y
# CONFIG_SCHED_TRACER is not set
CONFIG_CONTEXT_SWITCH_TRACER=y
# CONFIG_EVENT_TRACER is not set
# CONFIG_FTRACE_SYSCALLS is not set
# CONFIG_BOOT_TRACER is not set
# CONFIG_TRACE_BRANCH_PROFILING is not set
# CONFIG_POWER_TRACER is not set
# CONFIG_STACK_TRACER is not set
# CONFIG_KMEMTRACE is not set
# CONFIG_WORKQUEUE_TRACER is not set
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_FTRACE_STARTUP_TEST is not set
# CONFIG_MMIOTRACE is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
# CONFIG_DEBUG_RODATA is not set
# CONFIG_DEBUG_NX_TEST is not set
# CONFIG_IOMMU_DEBUG is not set
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
# CONFIG_IO_DELAY_0X80 is not set
CONFIG_IO_DELAY_0XED=y
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=1
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITYFS is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_IMA is not set
# CONFIG_CRYPTO is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
# CONFIG_VIRTUALIZATION is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC_T10DIF=y
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Mainline kernel OLTP performance update
@ 2009-05-04 15:54 Styner, Douglas W
  2009-05-06  6:29 ` Anirban Chakraborty
  0 siblings, 1 reply; 122+ messages in thread
From: Styner, Douglas W @ 2009-05-04 15:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tripathi, Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi,
	Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong, Nueckel,
	Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun, Prickett,
	Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K, Chilukuri,
	Harita, chris.mason

<this time with subject line>
Summary: Measured the mainline kernel from kernel.org (2.6.30-rc4). 

The regression for 2.6.30-rc4 against the baseline, 2.6.24.2 is 2.15% (2.6.30-rc3 regression was 1.91%).  Oprofile reports 70.1204% user, 29.874% system. 

Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%   iowait%
2.6.24.2                1.000   22106   43709   75      24      0       0
2.6.30-rc4              0.978   30581   43034   75      25      0       0

Server configurations:
Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)


======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2                   Cycles% 2.6.30-rc4
74.8578 <database>                 67.8732 <database>
1.0500 qla24xx_start_scsi          1.1162 qla24xx_start_scsi
0.8089 schedule                    0.9888 qla24xx_intr_handler
0.5864 kmem_cache_alloc            0.8776 __schedule
0.4989 __blockdev_direct_IO        0.7401 kmem_cache_alloc
0.4357 __sigsetjmp                 0.4914 read_hpet
0.4152 copy_user_generic_string    0.4792 __sigsetjmp
0.3953 qla24xx_intr_handler        0.4368 __blockdev_direct_IO
0.3850 memcpy                      0.3822 task_rq_lock
0.3596 scsi_request_fn             0.3781 __switch_to
0.3188 __switch_to                 0.3620 __list_add
0.2889 lock_timer_base             0.3377 rb_get_reader_page
0.2750 memmove                     0.3336 copy_user_generic_string
0.2519 task_rq_lock                0.3195 try_to_wake_up
0.2474 aio_complete                0.3114 scsi_request_fn
0.2460 scsi_alloc_sgtable          0.3114 ring_buffer_consume
0.2445 generic_make_request        0.2932 aio_complete
0.2263 qla2x00_process_completed_re0.2730 lock_timer_base
0.2118 blk_queue_end_tag           0.2588 memset_c
0.2085 dio_bio_complete            0.2588 mod_timer
0.2021 e1000_xmit_frame            0.2447 generic_make_request
0.2006 __end_that_request_first    0.2426 qla2x00_process_completed_re
0.1954 generic_file_aio_read       0.2265 tcp_sendmsg
0.1949 kfree                       0.2184 memmove
0.1915 tcp_sendmsg                 0.2184 kfree
0.1901 try_to_wake_up              0.2103 scsi_device_unbusy
0.1895 kref_get                    0.2083 mempool_free
0.1864 __mod_timer                 0.1961 blk_queue_end_tag
0.1863 thread_return               0.1941 kmem_cache_free
0.1854 math_state_restore          0.1921 kref_get

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-04-29 18:06           ` Pallipadi, Venkatesh
@ 2009-04-29 18:25             ` Styner, Douglas W
  0 siblings, 0 replies; 122+ messages in thread
From: Styner, Douglas W @ 2009-04-29 18:25 UTC (permalink / raw)
  To: Pallipadi, Venkatesh, Chris Mason
  Cc: Peter Zijlstra, Andrew Morton, linux-kernel, Tripathi, Sharad C,
	arjan, Wilcox, Matthew R, Kleen, Andi, Siddha, Suresh B, Ma,
	Chinang, Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F,
	Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, Ingo Molnar

Pallipadi, Venkatesh writes:
>Output of
># grep . /sys/devices/system/cpu/cpu0/cpufreq/*
>can tell us whether P-state software coordination is the reason behind
>excessive resched IPIs. Look for ondemand being the current_governor and
>affected_cpus containing more than one CPU in it.

On these setups, we are disabling frequency scaling at the bios. 

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-29 17:46         ` Chris Mason
@ 2009-04-29 18:06           ` Pallipadi, Venkatesh
  2009-04-29 18:25             ` Styner, Douglas W
  0 siblings, 1 reply; 122+ messages in thread
From: Pallipadi, Venkatesh @ 2009-04-29 18:06 UTC (permalink / raw)
  To: Chris Mason
  Cc: Peter Zijlstra, Andrew Morton, Styner, Douglas W, linux-kernel,
	Tripathi, Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi,
	Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong, Nueckel,
	Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun, Prickett,
	Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K, Chilukuri,
	Harita, Ingo Molnar

On Wed, 2009-04-29 at 10:46 -0700, Chris Mason wrote:
> On Wed, 2009-04-29 at 18:25 +0200, Peter Zijlstra wrote:
> > On Wed, 2009-04-29 at 09:07 -0700, Andrew Morton wrote:
> > > On Wed, 29 Apr 2009 08:48:19 -0700 "Styner, Douglas W" <douglas.w.styner@intel.com> wrote:
> > > 
> > > > >-----Original Message-----
> > > > >From: Andrew Morton [mailto:akpm@linux-foundation.org]
> > > > >Sent: Wednesday, April 29, 2009 12:30 AM
> > > > >To: Styner, Douglas W
> > > > >Cc: linux-kernel@vger.kernel.org; Tripathi, Sharad C;
> > > > >arjan@linux.intel.com; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> > > > >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> > > > >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> > > > >Anil K; Chilukuri, Harita; chris.mason@oracle.com
> > > > >Subject: Re: Mainline kernel OLTP performance update
> > > > >
> > > > >On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
> > > > ><douglas.w.styner@intel.com> wrote:
> > > > >
> > > > >> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
> > > > >>
> > > > >> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
> > > > >Oprofile reports 71.1626% user, 28.8295% system.
> > > > >>
> > > > >> Linux OLTP Performance summary
> > > > >> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
> > > > >iowait%
> > > > >> 2.6.24.2                1.000   22106   43709   75      24      0       0
> > > > >> 2.6.30-rc3              0.981   30645   43027   75      25      0       0
> > > > >
> > > > >The main difference there is the interrupt frequency.  Do we know which
> > > > >interrupt source(s) caused this?
> > > >
> > > > Our analysis of the interrupts shows that rescheduling interrupts are
> > > > up 2.2x from 2.6.24.2 --> 2.6.30-rc3.  Qla2xxx interrupts are roughly
> > > > the same.
> > > 
> > > (top-posting repaired)
> > > 
> > > OK, thanks.  Seems odd that the rescheduling interrupt rate increased
> > > while the context-switch rate actually fell a couple of percent.
> > > 
> > > This came up a few weeks ago and iirc Peter was mainly involved, and I
> > > don't believe that anything conclusive ended up happening.  Peter,
> > > could you please remind us of (and summarise) the story here?
> > 
> > I've had several reports about the resched-ipi going in overdrive, but
> > nobody bothered to bisect it, nor have I yet done so -- no clear ideas
> > on why it is doing so.
> > 
> > I'll put it somewhere higher on the todo list.
> > 
> 
> One cause of them in the past was the ondemand cpufreq module.  It got
> fixed up for my laptop workload at least starting w/2.6.29, but it might
> make sense to try without ondemand if you're running it.
> 

Output of
# grep . /sys/devices/system/cpu/cpu0/cpufreq/*
can tell us whether P-state software coordination is the reason behind
excessive resched IPIs. Look for ondemand being the current_governor and
affected_cpus containing more than one CPU in it.

Thanks,
Venki


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-04-29 16:25       ` Peter Zijlstra
  2009-04-29 17:46         ` Chris Mason
@ 2009-04-29 17:52         ` Styner, Douglas W
  1 sibling, 0 replies; 122+ messages in thread
From: Styner, Douglas W @ 2009-04-29 17:52 UTC (permalink / raw)
  To: Peter Zijlstra, Andrew Morton
  Cc: linux-kernel, Tripathi, Sharad C, arjan, Wilcox, Matthew R,
	Kleen, Andi, Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong,
	Nueckel, Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun,
	Prickett, Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K,
	Chilukuri, Harita, chris.mason, Ingo Molnar

Peter Ziljstra writes:
>
>I've had several reports about the resched-ipi going in overdrive, but
>nobody bothered to bisect it, nor have I yet done so -- no clear ideas
>on why it is doing so.
>
>I'll put it somewhere higher on the todo list.

FWIW, here is the interrupt data I was referring to.  The kernel delta refers to the difference in /proc/interrupts between the start and end of the run.  All database processes are running SCHED_RR.

                               2.6.24.2   2.6.30-rc3 
                                 delta       delta    % change
PCI-MSI-edge    qla2xxx         5270060     6118088      16.1%
PCI-MSI-edge    qla2xxx         5630742     5439656      -3.4%
PCI-MSI-edge    qla2xxx         5836425     5938014       1.7%
PCI-MSI-edge    qla2xxx         5774269     6007126       4.0%
PCI-MSI-edge    qla2xxx         5239457     5774888      10.2%
PCI-MSI-edge    qla2xxx         5965193     5424013      -9.1%
PCI-MSI-edge    eth0           31404141    32443614       3.3%
PCI-MSI-edge    eth1               1754        1453     -17.2%
Non-maskable interrupts        14041623    12980424      -7.6%
Local timer interrupts         27948168    28911532       3.4%
Rescheduling interrupts         1905119     4226516     121.9%
Function call interrupts            210          49     -76.7%
TLB shootdowns                      684        1455     112.7%


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-29 16:25       ` Peter Zijlstra
@ 2009-04-29 17:46         ` Chris Mason
  2009-04-29 18:06           ` Pallipadi, Venkatesh
  2009-04-29 17:52         ` Styner, Douglas W
  1 sibling, 1 reply; 122+ messages in thread
From: Chris Mason @ 2009-04-29 17:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Styner, Douglas W, linux-kernel, Tripathi,
	Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi, Siddha,
	Suresh B, Ma, Chinang, Wang, Peter Xihong, Nueckel, Hubert,
	Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O,
	Shunmuganathan, Rajalakshmi, Garg, Anil K, Chilukuri, Harita,
	Ingo Molnar

On Wed, 2009-04-29 at 18:25 +0200, Peter Zijlstra wrote:
> On Wed, 2009-04-29 at 09:07 -0700, Andrew Morton wrote:
> > On Wed, 29 Apr 2009 08:48:19 -0700 "Styner, Douglas W" <douglas.w.styner@intel.com> wrote:
> > 
> > > >-----Original Message-----
> > > >From: Andrew Morton [mailto:akpm@linux-foundation.org]
> > > >Sent: Wednesday, April 29, 2009 12:30 AM
> > > >To: Styner, Douglas W
> > > >Cc: linux-kernel@vger.kernel.org; Tripathi, Sharad C;
> > > >arjan@linux.intel.com; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> > > >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> > > >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> > > >Anil K; Chilukuri, Harita; chris.mason@oracle.com
> > > >Subject: Re: Mainline kernel OLTP performance update
> > > >
> > > >On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
> > > ><douglas.w.styner@intel.com> wrote:
> > > >
> > > >> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
> > > >>
> > > >> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
> > > >Oprofile reports 71.1626% user, 28.8295% system.
> > > >>
> > > >> Linux OLTP Performance summary
> > > >> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
> > > >iowait%
> > > >> 2.6.24.2                1.000   22106   43709   75      24      0       0
> > > >> 2.6.30-rc3              0.981   30645   43027   75      25      0       0
> > > >
> > > >The main difference there is the interrupt frequency.  Do we know which
> > > >interrupt source(s) caused this?
> > >
> > > Our analysis of the interrupts shows that rescheduling interrupts are
> > > up 2.2x from 2.6.24.2 --> 2.6.30-rc3.  Qla2xxx interrupts are roughly
> > > the same.
> > 
> > (top-posting repaired)
> > 
> > OK, thanks.  Seems odd that the rescheduling interrupt rate increased
> > while the context-switch rate actually fell a couple of percent.
> > 
> > This came up a few weeks ago and iirc Peter was mainly involved, and I
> > don't believe that anything conclusive ended up happening.  Peter,
> > could you please remind us of (and summarise) the story here?
> 
> I've had several reports about the resched-ipi going in overdrive, but
> nobody bothered to bisect it, nor have I yet done so -- no clear ideas
> on why it is doing so.
> 
> I'll put it somewhere higher on the todo list.
> 

One cause of them in the past was the ondemand cpufreq module.  It got
fixed up for my laptop workload at least starting w/2.6.29, but it might
make sense to try without ondemand if you're running it.

-chris



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-29 16:07     ` Andrew Morton
@ 2009-04-29 16:25       ` Peter Zijlstra
  2009-04-29 17:46         ` Chris Mason
  2009-04-29 17:52         ` Styner, Douglas W
  0 siblings, 2 replies; 122+ messages in thread
From: Peter Zijlstra @ 2009-04-29 16:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Styner, Douglas W, linux-kernel, Tripathi, Sharad C, arjan,
	Wilcox, Matthew R, Kleen, Andi, Siddha, Suresh B, Ma, Chinang,
	Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F, Nelson,
	Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, chris.mason,
	Ingo Molnar

On Wed, 2009-04-29 at 09:07 -0700, Andrew Morton wrote:
> On Wed, 29 Apr 2009 08:48:19 -0700 "Styner, Douglas W" <douglas.w.styner@intel.com> wrote:
> 
> > >-----Original Message-----
> > >From: Andrew Morton [mailto:akpm@linux-foundation.org]
> > >Sent: Wednesday, April 29, 2009 12:30 AM
> > >To: Styner, Douglas W
> > >Cc: linux-kernel@vger.kernel.org; Tripathi, Sharad C;
> > >arjan@linux.intel.com; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> > >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> > >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> > >Anil K; Chilukuri, Harita; chris.mason@oracle.com
> > >Subject: Re: Mainline kernel OLTP performance update
> > >
> > >On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
> > ><douglas.w.styner@intel.com> wrote:
> > >
> > >> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
> > >>
> > >> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
> > >Oprofile reports 71.1626% user, 28.8295% system.
> > >>
> > >> Linux OLTP Performance summary
> > >> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
> > >iowait%
> > >> 2.6.24.2                1.000   22106   43709   75      24      0       0
> > >> 2.6.30-rc3              0.981   30645   43027   75      25      0       0
> > >
> > >The main difference there is the interrupt frequency.  Do we know which
> > >interrupt source(s) caused this?
> >
> > Our analysis of the interrupts shows that rescheduling interrupts are
> > up 2.2x from 2.6.24.2 --> 2.6.30-rc3.  Qla2xxx interrupts are roughly
> > the same.
> 
> (top-posting repaired)
> 
> OK, thanks.  Seems odd that the rescheduling interrupt rate increased
> while the context-switch rate actually fell a couple of percent.
> 
> This came up a few weeks ago and iirc Peter was mainly involved, and I
> don't believe that anything conclusive ended up happening.  Peter,
> could you please remind us of (and summarise) the story here?

I've had several reports about the resched-ipi going in overdrive, but
nobody bothered to bisect it, nor have I yet done so -- no clear ideas
on why it is doing so.

I'll put it somewhere higher on the todo list.


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-29 16:06       ` Wilcox, Matthew R
@ 2009-04-29 16:19         ` Andi Kleen
  0 siblings, 0 replies; 122+ messages in thread
From: Andi Kleen @ 2009-04-29 16:19 UTC (permalink / raw)
  To: Wilcox, Matthew R
  Cc: Styner, Douglas W, Andi Kleen, Andrew Morton, linux-kernel,
	Tripathi, Sharad C, arjan, Kleen, Andi, Siddha, Suresh B, Ma,
	Chinang, Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F,
	Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, chris.mason

On Wed, Apr 29, 2009 at 10:06:44AM -0600, Wilcox, Matthew R wrote:
> Is it possible that's simply 'oprofile has a 4% overhead'?

We would expect that overhead to be spread between kernel and
database then, not only database. Maybe it's partly, but it's
probably not the complete answer.

Also at least the user space/context part of oprofile is profiled by 
oprofile too, just not the nmi handler, so if these parts are 
expensive it should be visible.

That the lowering of the period made a difference was interesting.
It might be that oprofile is just getting more and more inaccurate.

-Andi


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-29 15:48   ` Styner, Douglas W
@ 2009-04-29 16:07     ` Andrew Morton
  2009-04-29 16:25       ` Peter Zijlstra
  0 siblings, 1 reply; 122+ messages in thread
From: Andrew Morton @ 2009-04-29 16:07 UTC (permalink / raw)
  To: Styner, Douglas W
  Cc: linux-kernel, Tripathi, Sharad C, arjan, Wilcox, Matthew R,
	Kleen, Andi, Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong,
	Nueckel, Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun,
	Prickett, Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K,
	Chilukuri, Harita, chris.mason, Peter Zijlstra

On Wed, 29 Apr 2009 08:48:19 -0700 "Styner, Douglas W" <douglas.w.styner@intel.com> wrote:

> >-----Original Message-----
> >From: Andrew Morton [mailto:akpm@linux-foundation.org]
> >Sent: Wednesday, April 29, 2009 12:30 AM
> >To: Styner, Douglas W
> >Cc: linux-kernel@vger.kernel.org; Tripathi, Sharad C;
> >arjan@linux.intel.com; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> >Anil K; Chilukuri, Harita; chris.mason@oracle.com
> >Subject: Re: Mainline kernel OLTP performance update
> >
> >On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
> ><douglas.w.styner@intel.com> wrote:
> >
> >> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
> >>
> >> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
> >Oprofile reports 71.1626% user, 28.8295% system.
> >>
> >> Linux OLTP Performance summary
> >> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
> >iowait%
> >> 2.6.24.2                1.000   22106   43709   75      24      0       0
> >> 2.6.30-rc3              0.981   30645   43027   75      25      0       0
> >
> >The main difference there is the interrupt frequency.  Do we know which
> >interrupt source(s) caused this?
>
> Our analysis of the interrupts shows that rescheduling interrupts are
> up 2.2x from 2.6.24.2 --> 2.6.30-rc3.  Qla2xxx interrupts are roughly
> the same.

(top-posting repaired)

OK, thanks.  Seems odd that the rescheduling interrupt rate increased
while the context-switch rate actually fell a couple of percent.

This came up a few weeks ago and iirc Peter was mainly involved, and I
don't believe that anything conclusive ended up happening.  Peter,
could you please remind us of (and summarise) the story here?


^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-04-29 16:00     ` Styner, Douglas W
@ 2009-04-29 16:06       ` Wilcox, Matthew R
  2009-04-29 16:19         ` Andi Kleen
  0 siblings, 1 reply; 122+ messages in thread
From: Wilcox, Matthew R @ 2009-04-29 16:06 UTC (permalink / raw)
  To: Styner, Douglas W, Andi Kleen, Andrew Morton
  Cc: linux-kernel, Tripathi, Sharad C, arjan, Kleen, Andi, Siddha,
	Suresh B, Ma, Chinang, Wang, Peter Xihong, Nueckel, Hubert,
	Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun, Prickett, Terry O,
	Shunmuganathan, Rajalakshmi, Garg, Anil K, Chilukuri, Harita,
	chris.mason

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2327 bytes --]

Is it possible that's simply 'oprofile has a 4% overhead'?

> -----Original Message-----
> From: Styner, Douglas W
> Sent: Wednesday, April 29, 2009 9:00 AM
> To: Andi Kleen; Andrew Morton
> Cc: linux-kernel@vger.kernel.org; Tripathi, Sharad C;
> arjan@linux.intel.com; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
> Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
> Anil K; Chilukuri, Harita; chris.mason@oracle.com
> Subject: RE: Mainline kernel OLTP performance update
> 
> What we showed was that vmstat and vtune agreed wrt system/user time.
> Oprofile is off by ~4% (4% too low for user. 4% too high for system)
> 
> >-----Original Message-----
> >From: Andi Kleen [mailto:andi@firstfloor.org]
> >Sent: Wednesday, April 29, 2009 1:28 AM
> >To: Andrew Morton
> >Cc: Styner, Douglas W; linux-kernel@vger.kernel.org; Tripathi, Sharad C;
> >arjan@linux.intel.com; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
> >Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F;
> Nelson,
> >Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi;
> Garg,
> >Anil K; Chilukuri, Harita; chris.mason@oracle.com
> >Subject: Re: Mainline kernel OLTP performance update
> >
> >Andrew Morton <akpm@linux-foundation.org> writes:
> >
> >>> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> >>> Cycles% 2.6.24.2                   Cycles% 2.6.30-rc3
> >>> 74.8578 <database>                 69.1925 <database>
> >>
> >> ouch, that's a large drop in userspace CPU occupancy.  It seems
> >> inconsistent with the 1.91% above.
> >
> >That was determined to be an oprofile artifact/regression (see Doug's
> >other email+thread) The 2.6.30 oprofile seems to be less accurate than
> >the one in 2.6.24. Of course the question is if it can't get
> >the user space right, is the kernel data accurate. But I believe
> >Doug verified with vtune that the kernel data is roughly correct,
> >just user space profiling was slightly bogus (right, Doug, or
> >do I misrepresent that?)
> >
> >-Andi
> >
> >--
> >ak@linux.intel.com -- Speaking for myself only.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-04-29  8:28   ` Andi Kleen
@ 2009-04-29 16:00     ` Styner, Douglas W
  2009-04-29 16:06       ` Wilcox, Matthew R
  0 siblings, 1 reply; 122+ messages in thread
From: Styner, Douglas W @ 2009-04-29 16:00 UTC (permalink / raw)
  To: Andi Kleen, Andrew Morton
  Cc: linux-kernel, Tripathi, Sharad C, arjan, Wilcox, Matthew R,
	Kleen, Andi, Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong,
	Nueckel, Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun,
	Prickett, Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K,
	Chilukuri, Harita, chris.mason

What we showed was that vmstat and vtune agreed wrt system/user time.  Oprofile is off by ~4% (4% too low for user. 4% too high for system)

>-----Original Message-----
>From: Andi Kleen [mailto:andi@firstfloor.org]
>Sent: Wednesday, April 29, 2009 1:28 AM
>To: Andrew Morton
>Cc: Styner, Douglas W; linux-kernel@vger.kernel.org; Tripathi, Sharad C;
>arjan@linux.intel.com; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
>Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
>Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
>Anil K; Chilukuri, Harita; chris.mason@oracle.com
>Subject: Re: Mainline kernel OLTP performance update
>
>Andrew Morton <akpm@linux-foundation.org> writes:
>
>>> ======oprofile CPU_CLK_UNHALTED for top 30 functions
>>> Cycles% 2.6.24.2                   Cycles% 2.6.30-rc3
>>> 74.8578 <database>                 69.1925 <database>
>>
>> ouch, that's a large drop in userspace CPU occupancy.  It seems
>> inconsistent with the 1.91% above.
>
>That was determined to be an oprofile artifact/regression (see Doug's
>other email+thread) The 2.6.30 oprofile seems to be less accurate than
>the one in 2.6.24. Of course the question is if it can't get
>the user space right, is the kernel data accurate. But I believe
>Doug verified with vtune that the kernel data is roughly correct,
>just user space profiling was slightly bogus (right, Doug, or
>do I misrepresent that?)
>
>-Andi
>
>--
>ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-04-29  7:29 ` Andrew Morton
  2009-04-29  8:28   ` Andi Kleen
@ 2009-04-29 15:48   ` Styner, Douglas W
  2009-04-29 16:07     ` Andrew Morton
  1 sibling, 1 reply; 122+ messages in thread
From: Styner, Douglas W @ 2009-04-29 15:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Tripathi, Sharad C, arjan, Wilcox, Matthew R,
	Kleen, Andi, Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong,
	Nueckel, Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun,
	Prickett, Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K,
	Chilukuri, Harita, chris.mason

Our analysis of the interrupts shows that rescheduling interrupts are up 2.2x from 2.6.24.2 --> 2.6.30-rc3.  Qla2xxx interrupts are roughly the same.

Doug

>-----Original Message-----
>From: Andrew Morton [mailto:akpm@linux-foundation.org]
>Sent: Wednesday, April 29, 2009 12:30 AM
>To: Styner, Douglas W
>Cc: linux-kernel@vger.kernel.org; Tripathi, Sharad C;
>arjan@linux.intel.com; Wilcox, Matthew R; Kleen, Andi; Siddha, Suresh B;
>Ma, Chinang; Wang, Peter Xihong; Nueckel, Hubert; Recalde, Luis F; Nelson,
>Doug; Cheng, Wu-sun; Prickett, Terry O; Shunmuganathan, Rajalakshmi; Garg,
>Anil K; Chilukuri, Harita; chris.mason@oracle.com
>Subject: Re: Mainline kernel OLTP performance update
>
>On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W"
><douglas.w.styner@intel.com> wrote:
>
>> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3).
>>
>> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.
>Oprofile reports 71.1626% user, 28.8295% system.
>>
>> Linux OLTP Performance summary
>> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%
>iowait%
>> 2.6.24.2                1.000   22106   43709   75      24      0       0
>> 2.6.30-rc3              0.981   30645   43027   75      25      0       0
>
>The main difference there is the interrupt frequency.  Do we know which
>interrupt source(s) caused this?
>
>> Server configurations:
>> Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
>> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
>>
>>
>> ======oprofile CPU_CLK_UNHALTED for top 30 functions
>> Cycles% 2.6.24.2                   Cycles% 2.6.30-rc3
>> 74.8578 <database>                 69.1925 <database>
>
>ouch, that's a large drop in userspace CPU occupancy.  It seems
>inconsistent with the 1.91% above.
>
>> 1.0500 qla24xx_start_scsi          1.1314 qla24xx_intr_handler
>> 0.8089 schedule                    1.0031 qla24xx_start_scsi
>> 0.5864 kmem_cache_alloc            0.8476 __schedule
>> 0.4989 __blockdev_direct_IO        0.6532 kmem_cache_alloc
>> 0.4357 __sigsetjmp                 0.4490 __blockdev_direct_IO
>> 0.4152 copy_user_generic_string    0.4199 __sigsetjmp
>> 0.3953 qla24xx_intr_handler        0.3946 __switch_to
>> 0.3850 memcpy                      0.3538 __list_add
>> 0.3596 scsi_request_fn             0.3499 task_rq_lock
>> 0.3188 __switch_to                 0.3402 scsi_request_fn
>> 0.2889 lock_timer_base             0.3382 rb_get_reader_page
>> 0.2750 memmove                     0.3363 copy_user_generic_string
>> 0.2519 task_rq_lock                0.3324 aio_complete
>> 0.2474 aio_complete                0.3110 try_to_wake_up
>> 0.2460 scsi_alloc_sgtable          0.2877 ring_buffer_consume
>> 0.2445 generic_make_request        0.2683 mod_timer
>> 0.2263 qla2x00_process_completed_re0.2605 qla2x00_process_completed_re
>> 0.2118 blk_queue_end_tag           0.2566 blk_queue_end_tag
>> 0.2085 dio_bio_complete            0.2566 generic_make_request
>> 0.2021 e1000_xmit_frame            0.2547 tcp_sendmsg
>> 0.2006 __end_that_request_first    0.2372 lock_timer_base
>> 0.1954 generic_file_aio_read       0.2333 memmove
>> 0.1949 kfree                       0.2294 memset_c
>> 0.1915 tcp_sendmsg                 0.2080 mempool_free
>> 0.1901 try_to_wake_up              0.2022 generic_file_aio_read
>> 0.1895 kref_get                    0.1963 scsi_device_unbusy
>> 0.1864 __mod_timer                 0.1963 plist_del
>> 0.1863 thread_return               0.1944 dequeue_rt_stack
>> 0.1854 math_state_restore          0.1924 e1000_xmit_frame


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-29  7:29 ` Andrew Morton
@ 2009-04-29  8:28   ` Andi Kleen
  2009-04-29 16:00     ` Styner, Douglas W
  2009-04-29 15:48   ` Styner, Douglas W
  1 sibling, 1 reply; 122+ messages in thread
From: Andi Kleen @ 2009-04-29  8:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Styner, Douglas W, linux-kernel, Tripathi, Sharad C, arjan,
	Wilcox, Matthew R, Kleen, Andi, Siddha, Suresh B, Ma, Chinang,
	Wang, Peter Xihong, Nueckel, Hubert, Recalde, Luis F, Nelson,
	Doug, Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan,
	Rajalakshmi, Garg, Anil K, Chilukuri, Harita, chris.mason

Andrew Morton <akpm@linux-foundation.org> writes:

>> ======oprofile CPU_CLK_UNHALTED for top 30 functions
>> Cycles% 2.6.24.2                   Cycles% 2.6.30-rc3
>> 74.8578 <database>                 69.1925 <database>
>
> ouch, that's a large drop in userspace CPU occupancy.  It seems
> inconsistent with the 1.91% above.

That was determined to be an oprofile artifact/regression (see Doug's
other email+thread) The 2.6.30 oprofile seems to be less accurate than
the one in 2.6.24. Of course the question is if it can't get
the user space right, is the kernel data accurate. But I believe
Doug verified with vtune that the kernel data is roughly correct,
just user space profiling was slightly bogus (right, Doug, or
do I misrepresent that?)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-28 17:08 Styner, Douglas W
@ 2009-04-29  7:29 ` Andrew Morton
  2009-04-29  8:28   ` Andi Kleen
  2009-04-29 15:48   ` Styner, Douglas W
  0 siblings, 2 replies; 122+ messages in thread
From: Andrew Morton @ 2009-04-29  7:29 UTC (permalink / raw)
  To: Styner, Douglas W
  Cc: linux-kernel, Tripathi, Sharad C, arjan, Wilcox, Matthew R,
	Kleen, Andi, Siddha, Suresh B, Ma, Chinang, Wang, Peter Xihong,
	Nueckel, Hubert, Recalde, Luis F, Nelson, Doug, Cheng, Wu-sun,
	Prickett, Terry O, Shunmuganathan, Rajalakshmi, Garg, Anil K,
	Chilukuri, Harita, chris.mason

On Tue, 28 Apr 2009 10:08:22 -0700 "Styner, Douglas W" <douglas.w.styner@intel.com> wrote:

> Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3). 
> 
> The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.  Oprofile reports 71.1626% user, 28.8295% system.  
> 
> Linux OLTP Performance summary
> Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%   iowait%
> 2.6.24.2                1.000   22106   43709   75      24      0       0
> 2.6.30-rc3              0.981   30645   43027   75      25      0       0

The main difference there is the interrupt frequency.  Do we know which
interrupt source(s) caused this?

> Server configurations:
> Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
> 64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)
> 
> 
> ======oprofile CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.24.2                   Cycles% 2.6.30-rc3
> 74.8578 <database>                 69.1925 <database>

ouch, that's a large drop in userspace CPU occupancy.  It seems
inconsistent with the 1.91% above.

> 1.0500 qla24xx_start_scsi          1.1314 qla24xx_intr_handler
> 0.8089 schedule                    1.0031 qla24xx_start_scsi
> 0.5864 kmem_cache_alloc            0.8476 __schedule
> 0.4989 __blockdev_direct_IO        0.6532 kmem_cache_alloc
> 0.4357 __sigsetjmp                 0.4490 __blockdev_direct_IO
> 0.4152 copy_user_generic_string    0.4199 __sigsetjmp
> 0.3953 qla24xx_intr_handler        0.3946 __switch_to
> 0.3850 memcpy                      0.3538 __list_add
> 0.3596 scsi_request_fn             0.3499 task_rq_lock
> 0.3188 __switch_to                 0.3402 scsi_request_fn
> 0.2889 lock_timer_base             0.3382 rb_get_reader_page
> 0.2750 memmove                     0.3363 copy_user_generic_string
> 0.2519 task_rq_lock                0.3324 aio_complete
> 0.2474 aio_complete                0.3110 try_to_wake_up
> 0.2460 scsi_alloc_sgtable          0.2877 ring_buffer_consume
> 0.2445 generic_make_request        0.2683 mod_timer
> 0.2263 qla2x00_process_completed_re0.2605 qla2x00_process_completed_re
> 0.2118 blk_queue_end_tag           0.2566 blk_queue_end_tag
> 0.2085 dio_bio_complete            0.2566 generic_make_request
> 0.2021 e1000_xmit_frame            0.2547 tcp_sendmsg
> 0.2006 __end_that_request_first    0.2372 lock_timer_base
> 0.1954 generic_file_aio_read       0.2333 memmove
> 0.1949 kfree                       0.2294 memset_c
> 0.1915 tcp_sendmsg                 0.2080 mempool_free
> 0.1901 try_to_wake_up              0.2022 generic_file_aio_read
> 0.1895 kref_get                    0.1963 scsi_device_unbusy
> 0.1864 __mod_timer                 0.1963 plist_del
> 0.1863 thread_return               0.1944 dequeue_rt_stack
> 0.1854 math_state_restore          0.1924 e1000_xmit_frame


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Mainline kernel OLTP performance update
@ 2009-04-28 17:22 Styner, Douglas W
  0 siblings, 0 replies; 122+ messages in thread
From: Styner, Douglas W @ 2009-04-28 17:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tripathi, Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi,
	Siddha, Suresh B, Ma, Chinang, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, Recalde, Luis F, Nelson, Doug,
	Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan, Rajalakshmi,
	Garg, Anil K, Chilukuri, Harita, chris.mason

Summary: Measured the mainline kernel from kernel.org (2.6.29.2). 

The regression for 2.6.29.2 against the baseline, 2.6.24.2 is 2.07% (2.6.29.1 regression was 2.35%).  Oprofile reports 70.419% user, 29.5709% system. 2.6.29.1 -> 2.6.29.2 comparison is below. 

Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%   iowait%
2.6.24.2                1.000   22106   43709   75      24      0       0
2.6.29.2                0.979   30509   43139   75      25      0       0

Server configurations:
Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)

======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2                   Cycles% 2.6.29.2
74.8578 <database>                 67.7080 <database>
1.0500 qla24xx_start_scsi          0.9487 qla24xx_intr_handler
0.8089 schedule                    0.8117 schedule
0.5864 kmem_cache_alloc            0.6215 qla24xx_wrt_req_reg
0.4989 __blockdev_direct_IO        0.5439 kmem_cache_alloc
0.4357 __sigsetjmp                 0.4784 qla24xx_start_scsi
0.4152 copy_user_generic_string    0.4703 __blockdev_direct_IO
0.3953 qla24xx_intr_handler        0.4416 try_to_wake_up
0.3850 memcpy                      0.4253 __sigsetjmp
0.3596 scsi_request_fn             0.3803 scsi_request_fn
0.3188 __switch_to                 0.3701 __switch_to
0.2889 lock_timer_base             0.3619 copy_user_generic_string
0.2750 memmove                     0.3476 rb_get_reader_page
0.2519 task_rq_lock                0.3149 symbols)
0.2474 aio_complete                0.3006 aio_complete
0.2460 scsi_alloc_sgtable          0.2903 memset_c
0.2445 generic_make_request        0.2883 ring_buffer_consume
0.2263 qla2x00_process_completed_re0.2719 lock_timer_base
0.2118 blk_queue_end_tag           0.2699 __list_add
0.2085 dio_bio_complete            0.2474 blk_queue_end_tag
0.2021 e1000_xmit_frame            0.2453 memmove
0.2006 __end_that_request_first    0.2392 e1000_xmit_frame
0.1954 generic_file_aio_read       0.2290 ipc_lock
0.1949 kfree                       0.2290 task_rq_lock
0.1915 tcp_sendmsg                 0.2249 generic_make_request
0.1901 try_to_wake_up              0.2167 kref_get
0.1895 kref_get                    0.2147 tcp_sendmsg
0.1864 __mod_timer                 0.2045 qla2x00_process_completed_re
0.1863 thread_return               0.2045 pick_next_highest_task_rt

-- 2.6.29.1 vs. 2.6.29.2
Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%   iowait%
2.6.29.1                1.000   30570   42818   74      25      0       0
2.6.29.2                1.003   30509   43139   75      25      0       0

Server configurations:
Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)

======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.29.1                   Cycles% 2.6.29.2
64.5424 <database>                 67.7080 <database>
1.1571 qla24xx_intr_handler        0.9487 qla24xx_intr_handler
0.9209 schedule                    0.8117 schedule
0.6533 kmem_cache_alloc            0.6215 qla24xx_wrt_req_reg
0.5456 qla24xx_wrt_req_reg         0.5439 kmem_cache_alloc
0.5247 try_to_wake_up              0.4784 qla24xx_start_scsi
0.4858 qla24xx_start_scsi          0.4703 __blockdev_direct_IO
0.4485 __sigsetjmp                 0.4416 try_to_wake_up
0.3976 __blockdev_direct_IO        0.4253 __sigsetjmp
0.3857 __switch_to                 0.3803 scsi_request_fn
0.3692 copy_user_generic_string    0.3701 __switch_to
0.3648 aio_complete                0.3619 copy_user_generic_string
0.3633 scsi_request_fn             0.3476 rb_get_reader_page
0.3259 rb_get_reader_page          0.3149 symbols)
0.3109 ring_buffer_consume         0.3006 aio_complete
0.3050 memset_c                    0.2903 memset_c
0.2900 pick_next_highest_task_rt   0.2883 ring_buffer_consume
0.2885 page_fault                  0.2719 lock_timer_base
0.2855 task_rq_lock                0.2699 __list_add
0.2691 mwait_idle                  0.2474 blk_queue_end_tag
0.2661 lock_timer_base             0.2453 memmove
0.2616 symbols)                    0.2392 e1000_xmit_frame
0.2616 __list_add                  0.2290 ipc_lock
0.2541 tcp_sendmsg                 0.2290 task_rq_lock
0.2302 blk_queue_end_tag           0.2249 generic_make_request
0.2242 e1000_xmit_frame            0.2167 kref_get
0.2198 scsi_softirq_done           0.2147 tcp_sendmsg
0.2183 qla2x00_process_completed_re0.2045 qla2x00_process_completed_re
0.2168 memmove                     0.2045 pick_next_highest_task_rt
0.2138 cpupri_set                  0.2024 __mod_timer
0.2078 qla24xx_process_response_que0.2004 kmem_cache_free

^ permalink raw reply	[flat|nested] 122+ messages in thread

* RE: Mainline kernel OLTP performance update
  2009-04-28 17:15     ` James Bottomley
@ 2009-04-28 17:17       ` Styner, Douglas W
  0 siblings, 0 replies; 122+ messages in thread
From: Styner, Douglas W @ 2009-04-28 17:17 UTC (permalink / raw)
  To: James Bottomley, Chuck Ebbert
  Cc: Andi Kleen, linux-kernel, linux-driver, linux-scsi, Ma, Chinang

Working on it as we speak...

>-----Original Message-----
>From: James Bottomley [mailto:James.Bottomley@HansenPartnership.com]
>Sent: Tuesday, April 28, 2009 10:16 AM
>To: Chuck Ebbert
>Cc: Andi Kleen; Styner, Douglas W; linux-kernel@vger.kernel.org; linux-
>driver@qlogic.com; linux-scsi@vger.kernel.org
>Subject: Re: Mainline kernel OLTP performance update
>
>On Tue, 2009-04-28 at 12:57 -0400, Chuck Ebbert wrote:
>> On Mon, 27 Apr 2009 09:02:40 +0200
>> Andi Kleen <andi@firstfloor.org> wrote:
>>
>> > "Styner, Douglas W" <douglas.w.styner@intel.com> writes:
>> >
>> > >
>> > > ======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
>> > > Cycles% 2.6.24.2                   Cycles% 2.6.30-rc2
>> > > 74.8578 <database>                   67.6966 <database>
>> >
>> > The dip in database cycles is indeed worrying.
>> >
>> > > 1.0500 qla24xx_start_scsi          1.1724 qla24xx_start_scsi
>> > > 0.8089 schedule                    1.0578 qla24xx_intr_handler
>> > > 0.5864 kmem_cache_alloc            0.8259 __schedule
>> > > 0.4989 __blockdev_direct_IO        0.7451 kmem_cache_alloc
>> > > 0.4357 __sigsetjmp                 0.4872 __blockdev_direct_IO
>> > > 0.4152 copy_user_generic_string    0.4390 task_rq_lock
>> > > 0.3953 qla24xx_intr_handler        0.4338 __sigsetjmp
>> >
>> > And also why the qla24xx_intr_handler became ~2.5x as expensive.
>> > Cc linux-scsi and qla24xx maintainers.
>> >
>>
>> They are getting 31000 interrupts/sec vs. 22000/sec on older kernels.
>
>Should be fixed by:
>
>http://marc.info/?l=linux-scsi&m=124093712114937
>
>If someone could verify, I'd be grateful.
>
>Thanks,
>
>James
>


^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-28 16:57   ` Chuck Ebbert
@ 2009-04-28 17:15     ` James Bottomley
  2009-04-28 17:17       ` Styner, Douglas W
  0 siblings, 1 reply; 122+ messages in thread
From: James Bottomley @ 2009-04-28 17:15 UTC (permalink / raw)
  To: Chuck Ebbert
  Cc: Andi Kleen, Styner, Douglas W, linux-kernel, linux-driver, linux-scsi

On Tue, 2009-04-28 at 12:57 -0400, Chuck Ebbert wrote:
> On Mon, 27 Apr 2009 09:02:40 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
> 
> > "Styner, Douglas W" <douglas.w.styner@intel.com> writes:
> > 
> > >
> > > ======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
> > > Cycles% 2.6.24.2                   Cycles% 2.6.30-rc2
> > > 74.8578 <database>                   67.6966 <database>
> > 
> > The dip in database cycles is indeed worrying.
> > 
> > > 1.0500 qla24xx_start_scsi          1.1724 qla24xx_start_scsi
> > > 0.8089 schedule                    1.0578 qla24xx_intr_handler
> > > 0.5864 kmem_cache_alloc            0.8259 __schedule
> > > 0.4989 __blockdev_direct_IO        0.7451 kmem_cache_alloc
> > > 0.4357 __sigsetjmp                 0.4872 __blockdev_direct_IO
> > > 0.4152 copy_user_generic_string    0.4390 task_rq_lock
> > > 0.3953 qla24xx_intr_handler        0.4338 __sigsetjmp
> > 
> > And also why the qla24xx_intr_handler became ~2.5x as expensive.
> > Cc linux-scsi and qla24xx maintainers.
> > 
> 
> They are getting 31000 interrupts/sec vs. 22000/sec on older kernels.

Should be fixed by:

http://marc.info/?l=linux-scsi&m=124093712114937

If someone could verify, I'd be grateful.

Thanks,

James



^ permalink raw reply	[flat|nested] 122+ messages in thread

* Mainline kernel OLTP performance update
@ 2009-04-28 17:08 Styner, Douglas W
  2009-04-29  7:29 ` Andrew Morton
  0 siblings, 1 reply; 122+ messages in thread
From: Styner, Douglas W @ 2009-04-28 17:08 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tripathi, Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi,
	Siddha, Suresh B, Ma, Chinang, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, Recalde, Luis F, Nelson, Doug,
	Cheng, Wu-sun, Prickett, Terry O, Shunmuganathan, Rajalakshmi,
	Garg, Anil K, Chilukuri, Harita, chris.mason

Summary: Measured the mainline kernel from kernel.org (2.6.30-rc3). 

The regression for 2.6.30-rc3 against the baseline, 2.6.24.2 is 1.91%.  Oprofile reports 71.1626% user, 28.8295% system.  

Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%   iowait%
2.6.24.2                1.000   22106   43709   75      24      0       0
2.6.30-rc3              0.981   30645   43027   75      25      0       0

Server configurations:
Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)


======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2                   Cycles% 2.6.30-rc3
74.8578 <database>                 69.1925 <database>
1.0500 qla24xx_start_scsi          1.1314 qla24xx_intr_handler
0.8089 schedule                    1.0031 qla24xx_start_scsi
0.5864 kmem_cache_alloc            0.8476 __schedule
0.4989 __blockdev_direct_IO        0.6532 kmem_cache_alloc
0.4357 __sigsetjmp                 0.4490 __blockdev_direct_IO
0.4152 copy_user_generic_string    0.4199 __sigsetjmp
0.3953 qla24xx_intr_handler        0.3946 __switch_to
0.3850 memcpy                      0.3538 __list_add
0.3596 scsi_request_fn             0.3499 task_rq_lock
0.3188 __switch_to                 0.3402 scsi_request_fn
0.2889 lock_timer_base             0.3382 rb_get_reader_page
0.2750 memmove                     0.3363 copy_user_generic_string
0.2519 task_rq_lock                0.3324 aio_complete
0.2474 aio_complete                0.3110 try_to_wake_up
0.2460 scsi_alloc_sgtable          0.2877 ring_buffer_consume
0.2445 generic_make_request        0.2683 mod_timer
0.2263 qla2x00_process_completed_re0.2605 qla2x00_process_completed_re
0.2118 blk_queue_end_tag           0.2566 blk_queue_end_tag
0.2085 dio_bio_complete            0.2566 generic_make_request
0.2021 e1000_xmit_frame            0.2547 tcp_sendmsg
0.2006 __end_that_request_first    0.2372 lock_timer_base
0.1954 generic_file_aio_read       0.2333 memmove
0.1949 kfree                       0.2294 memset_c
0.1915 tcp_sendmsg                 0.2080 mempool_free
0.1901 try_to_wake_up              0.2022 generic_file_aio_read
0.1895 kref_get                    0.1963 scsi_device_unbusy
0.1864 __mod_timer                 0.1963 plist_del
0.1863 thread_return               0.1944 dequeue_rt_stack
0.1854 math_state_restore          0.1924 e1000_xmit_frame

Thanks
Doug

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-27  7:02 ` Andi Kleen
@ 2009-04-28 16:57   ` Chuck Ebbert
  2009-04-28 17:15     ` James Bottomley
  0 siblings, 1 reply; 122+ messages in thread
From: Chuck Ebbert @ 2009-04-28 16:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Styner, Douglas W, linux-kernel, linux-driver, linux-scsi

On Mon, 27 Apr 2009 09:02:40 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> "Styner, Douglas W" <douglas.w.styner@intel.com> writes:
> 
> >
> > ======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
> > Cycles% 2.6.24.2                   Cycles% 2.6.30-rc2
> > 74.8578 <database>                   67.6966 <database>
> 
> The dip in database cycles is indeed worrying.
> 
> > 1.0500 qla24xx_start_scsi          1.1724 qla24xx_start_scsi
> > 0.8089 schedule                    1.0578 qla24xx_intr_handler
> > 0.5864 kmem_cache_alloc            0.8259 __schedule
> > 0.4989 __blockdev_direct_IO        0.7451 kmem_cache_alloc
> > 0.4357 __sigsetjmp                 0.4872 __blockdev_direct_IO
> > 0.4152 copy_user_generic_string    0.4390 task_rq_lock
> > 0.3953 qla24xx_intr_handler        0.4338 __sigsetjmp
> 
> And also why the qla24xx_intr_handler became ~2.5x as expensive.
> Cc linux-scsi and qla24xx maintainers.
> 

They are getting 31000 interrupts/sec vs. 22000/sec on older kernels.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Re: Mainline kernel OLTP performance update
  2009-04-23 16:49 Styner, Douglas W
@ 2009-04-27  7:02 ` Andi Kleen
  2009-04-28 16:57   ` Chuck Ebbert
  0 siblings, 1 reply; 122+ messages in thread
From: Andi Kleen @ 2009-04-27  7:02 UTC (permalink / raw)
  To: Styner, Douglas W; +Cc: linux-kernel, linux-driver, linux-scsi

"Styner, Douglas W" <douglas.w.styner@intel.com> writes:

>
> ======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
> Cycles% 2.6.24.2                   Cycles% 2.6.30-rc2
> 74.8578 <database>                   67.6966 <database>

The dip in database cycles is indeed worrying.

> 1.0500 qla24xx_start_scsi          1.1724 qla24xx_start_scsi
> 0.8089 schedule                    1.0578 qla24xx_intr_handler
> 0.5864 kmem_cache_alloc            0.8259 __schedule
> 0.4989 __blockdev_direct_IO        0.7451 kmem_cache_alloc
> 0.4357 __sigsetjmp                 0.4872 __blockdev_direct_IO
> 0.4152 copy_user_generic_string    0.4390 task_rq_lock
> 0.3953 qla24xx_intr_handler        0.4338 __sigsetjmp

And also why the qla24xx_intr_handler became ~2.5x as expensive.
Cc linux-scsi and qla24xx maintainers.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Mainline kernel OLTP performance update
@ 2009-04-23 16:49 Styner, Douglas W
  2009-04-27  7:02 ` Andi Kleen
  0 siblings, 1 reply; 122+ messages in thread
From: Styner, Douglas W @ 2009-04-23 16:49 UTC (permalink / raw)
  To: linux-kernel


Summary: Measured the mainline kernel from kernel.org (2.6.30-rc2). 

The regression for 2.6.30-rc2 against the baseline, 2.6.24.2 is 1.95%.  Note the dip in cycles for database compared to us% in summary.  

Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%     sys%    idle%   iowait%
2.6.24.2                1.000   22106   43709   75      24      0       0
2.6.30-rc2              0.981   30755   43072   75      25      0       0

Server configurations:
Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)


======oprofile 0.9.3 CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2                   Cycles% 2.6.30-rc2
74.8578 <database>                   67.6966 <database>
1.0500 qla24xx_start_scsi          1.1724 qla24xx_start_scsi
0.8089 schedule                    1.0578 qla24xx_intr_handler
0.5864 kmem_cache_alloc            0.8259 __schedule
0.4989 __blockdev_direct_IO        0.7451 kmem_cache_alloc
0.4357 __sigsetjmp                 0.4872 __blockdev_direct_IO
0.4152 copy_user_generic_string    0.4390 task_rq_lock
0.3953 qla24xx_intr_handler        0.4338 __sigsetjmp
0.3850 memcpy                      0.4195 __switch_to
0.3596 scsi_request_fn             0.3713 copy_user_generic_string
0.3188 __switch_to                 0.3608 __list_add
0.2889 lock_timer_base             0.3595 rb_get_reader_page
0.2750 memmove                     0.3309 ring_buffer_consume
0.2519 task_rq_lock                0.3152 scsi_request_fn
0.2474 aio_complete                0.3048 try_to_wake_up
0.2460 scsi_alloc_sgtable          0.2983 tcp_sendmsg
0.2445 generic_make_request        0.2931 lock_timer_base
0.2263 qla2x00_process_completed_re0.2840 aio_complete
0.2118 blk_queue_end_tag           0.2697 memset_c
0.2085 dio_bio_complete            0.2527 mod_timer
0.2021 e1000_xmit_frame            0.2462 qla2x00_process_completed_re
0.2006 __end_that_request_first    0.2449 memmove
0.1954 generic_file_aio_read       0.2358 blk_queue_end_tag
0.1949 kfree                       0.2241 generic_make_request
0.1915 tcp_sendmsg                 0.2215 scsi_device_unbusy
0.1901 try_to_wake_up              0.2162 mempool_free
0.1895 kref_get                    0.2097 e1000_xmit_frame
0.1864 __mod_timer                 0.2097 kmem_cache_free
0.1863 thread_return               0.2058 kfree
0.1854 math_state_restore          0.1993 sched_clock_cpu
 
Thanks,
Doug

^ permalink raw reply	[flat|nested] 122+ messages in thread

* Mainline kernel OLTP performance update
@ 2009-01-12 18:30 Ma, Chinang
  0 siblings, 0 replies; 122+ messages in thread
From: Ma, Chinang @ 2009-01-12 18:30 UTC (permalink / raw)
  To: linux-kernel
  Cc: Tripathi, Sharad C, arjan, Wilcox, Matthew R, Kleen, Andi,
	Siddha, Suresh B, Chilukuri, Harita, Styner, Douglas W, Wang,
	Peter Xihong, Nueckel, Hubert, Chris Mason

Here is the latest 2.6.28 kernel OLTP performance comparing to 2.6.24.2 and 2.6.27.2. 

Linux OLTP Performance summary
Kernel#            Speedup(x)   Intr/s  CtxSw/s us%  sys%  idle% iowait%
2.6.24.2                1.000   21969   43425   76   24    0     0
2.6.27.2                0.973   30402   43523   74   25    0     1
2.6.28                  0.967   30400   42640   74   25    0     0

Server configurations:
Intel Xeon Quad-core 2.0GHz  2 cpus/8 cores/8 threads
64GB memory, 3 qle2462 FC HBA, 450 spindles (30 logical units)

======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.24.2                   Cycles% 2.6.27.2
1.0500 qla24xx_start_scsi          1.2125 qla24xx_start_scsi
0.8089 schedule                    0.6962 kmem_cache_alloc
0.5864 kmem_cache_alloc            0.6209 qla24xx_intr_handler
0.4989 __blockdev_direct_IO        0.4895 copy_user_generic_string
0.4152 copy_user_generic_string    0.4591 __blockdev_direct_IO
0.3953 qla24xx_intr_handler        0.4409 __end_that_request_first
0.3596 scsi_request_fn             0.3729 __switch_to
0.3188 __switch_to                 0.3716 try_to_wake_up
0.2889 lock_timer_base             0.3531 lock_timer_base
0.2519 task_rq_lock                0.3393 scsi_request_fn
0.2474 aio_complete                0.3038 aio_complete
0.2460 scsi_alloc_sgtable          0.2989 memset_c
0.2445 generic_make_request        0.2633 qla2x00_process_completed_re
0.2263 qla2x00_process_completed_re0.2583 pick_next_highest_task_rt
0.2118 blk_queue_end_tag           0.2578 generic_make_request
0.2085 dio_bio_complete            0.2510 __list_add
0.2021 e1000_xmit_frame            0.2459 task_rq_lock
0.2006 __end_that_request_first    0.2322 kmem_cache_free
0.1954 generic_file_aio_read       0.2206 blk_queue_end_tag
0.1949 kfree                       0.2205 __mod_timer
0.1915 tcp_sendmsg                 0.2179 update_curr_rt
0.1901 try_to_wake_up              0.2164 sd_prep_fn
0.1895 kref_get                    0.2130 kref_get
0.1864 __mod_timer                 0.2075 dio_bio_complete
0.1863 thread_return               0.2066 push_rt_task
0.1854 math_state_restore          0.1974 qla24xx_msix_default
0.1775 __list_add                  0.1935 generic_file_aio_read
0.1721 memset_c                    0.1870 scsi_device_unbusy
0.1706 find_vma                    0.1861 tcp_sendmsg
0.1688 read_tsc                    0.1843 e1000_xmit_frame


======oprofile CPU_CLK_UNHALTED for top 30 functions
Cycles% 2.6.27.2                   Cycles% 2.6.28
1.2125 qla24xx_start_scsi          1.4257 qla24xx_start_scsi
0.6962 kmem_cache_alloc            0.8784 kmem_cache_alloc
0.6209 qla24xx_intr_handler        0.6876 qla24xx_intr_handler
0.4895 copy_user_generic_string    0.5834 copy_user_generic_string
0.4591 __blockdev_direct_IO        0.4945 scsi_request_fn
0.4409 __end_that_request_first    0.4846 __blockdev_direct_IO
0.3729 __switch_to                 0.4187 try_to_wake_up
0.3716 try_to_wake_up              0.3518 aio_complete
0.3531 lock_timer_base             0.3513 __end_that_request_first
0.3393 scsi_request_fn             0.3483 __switch_to
0.3038 aio_complete                0.3271 memset_c
0.2989 memset_c                    0.2976 qla2x00_process_completed_re
0.2633 qla2x00_process_completed_re0.2905 __list_add
0.2583 pick_next_highest_task_rt   0.2901 generic_make_request
0.2578 generic_make_request        0.2755 lock_timer_base
0.2510 __list_add                  0.2741 blk_queue_end_tag
0.2459 task_rq_lock                0.2593 kmem_cache_free
0.2322 kmem_cache_free             0.2445 disk_map_sector_rcu
0.2206 blk_queue_end_tag           0.2370 pick_next_highest_task_rt
0.2205 __mod_timer                 0.2323 scsi_device_unbusy
0.2179 update_curr_rt              0.2321 task_rq_lock
0.2164 sd_prep_fn                  0.2316 scsi_dispatch_cmd
0.2130 kref_get                    0.2239 kref_get
0.2075 dio_bio_complete            0.2237 dio_bio_complete
0.2066 push_rt_task                0.2194 push_rt_task
0.1974 qla24xx_msix_default        0.2145 __aio_get_req
0.1935 generic_file_aio_read       0.2143 kfree
0.1870 scsi_device_unbusy          0.2138 __mod_timer
0.1861 tcp_sendmsg                 0.2131 e1000_irq_enable
0.1843 e1000_xmit_frame            0.2091 scsi_softirq_done



^ permalink raw reply	[flat|nested] 122+ messages in thread

end of thread, other threads:[~2010-01-25 18:26 UTC | newest]

Thread overview: 122+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-01-13 21:10 Mainline kernel OLTP performance update Ma, Chinang
2009-01-13 22:44 ` Wilcox, Matthew R
2009-01-15  0:35   ` Andrew Morton
2009-01-15  1:21     ` Matthew Wilcox
2009-01-15  2:04       ` Andrew Morton
2009-01-15  2:27         ` Steven Rostedt
2009-01-15  7:11           ` Ma, Chinang
2009-01-19 18:04             ` Chris Mason
2009-01-19 18:37               ` Steven Rostedt
2009-01-19 18:55                 ` Chris Mason
2009-01-19 19:07                   ` Steven Rostedt
2009-01-19 23:40                 ` Ingo Molnar
2009-01-15  2:39         ` Andi Kleen
2009-01-15  2:47           ` Matthew Wilcox
2009-01-15  3:36             ` Andi Kleen
2009-01-20 13:27             ` Jens Axboe
     [not found]               ` <588992150B702C48B3312184F1B810AD03A497632C@azsmsx501.amr.corp.intel.com>
2009-01-22 11:29                 ` Jens Axboe
     [not found]                   ` <588992150B702C48B3312184F1B810AD03A4F59632@azsmsx501.amr.corp.intel.com>
2009-01-27  8:28                     ` Jens Axboe
2009-01-15  7:24         ` Nick Piggin
2009-01-15  9:46           ` Pekka Enberg
2009-01-15 13:52             ` Matthew Wilcox
2009-01-15 14:42               ` Pekka Enberg
2009-01-16 10:16               ` Pekka Enberg
2009-01-16 10:21                 ` Nick Piggin
2009-01-16 10:31                   ` Pekka Enberg
2009-01-16 10:42                     ` Nick Piggin
2009-01-16 10:55                       ` Pekka Enberg
2009-01-19  7:13                         ` Nick Piggin
2009-01-19  8:05                           ` Pekka Enberg
2009-01-19  8:33                             ` Nick Piggin
2009-01-19  8:42                               ` Nick Piggin
2009-01-19  8:47                                 ` Pekka Enberg
2009-01-19  8:57                                   ` Nick Piggin
2009-01-19  9:48                               ` Pekka Enberg
2009-01-19 10:03                                 ` Nick Piggin
2009-01-16 20:59                     ` Christoph Lameter
2009-01-16  0:27           ` Andrew Morton
2009-01-16  4:03             ` Nick Piggin
2009-01-16  4:12               ` Andrew Morton
2009-01-16  6:46                 ` Nick Piggin
2009-01-16  6:55                   ` Matthew Wilcox
2009-01-16  7:06                     ` Nick Piggin
2009-01-16  7:53                     ` Zhang, Yanmin
2009-01-16 10:20                       ` Andi Kleen
2009-01-20  5:16                         ` Zhang, Yanmin
2009-01-21 23:58                           ` Christoph Lameter
2009-01-22  8:36                             ` Zhang, Yanmin
2009-01-22  9:15                               ` Pekka Enberg
2009-01-22  9:28                                 ` Zhang, Yanmin
2009-01-22  9:47                                   ` Pekka Enberg
2009-01-23  3:02                                     ` Zhang, Yanmin
2009-01-23  6:52                                       ` Pekka Enberg
2009-01-23  8:06                                         ` Pekka Enberg
2009-01-23  8:30                                           ` Zhang, Yanmin
2009-01-23  8:40                                             ` Pekka Enberg
2009-01-23  9:46                                             ` Pekka Enberg
2009-01-23 15:22                                               ` Christoph Lameter
2009-01-23 15:31                                                 ` Pekka Enberg
2009-01-23 15:55                                                   ` Christoph Lameter
2009-01-23 16:01                                                     ` Pekka Enberg
2009-01-24  2:55                                                 ` Zhang, Yanmin
2009-01-24  7:36                                                   ` Pekka Enberg
2009-02-12  5:22                                                     ` Zhang, Yanmin
2009-02-12  5:47                                                       ` Zhang, Yanmin
2009-02-12 15:25                                                         ` Christoph Lameter
2009-02-12 16:07                                                           ` Pekka Enberg
2009-02-12 16:03                                                         ` Pekka Enberg
2009-01-26 17:36                                                   ` Christoph Lameter
2009-02-01  2:52                                                     ` Zhang, Yanmin
2009-01-23  8:33                                       ` Nick Piggin
2009-01-23  9:02                                         ` Zhang, Yanmin
2009-01-23 18:40                                           ` care and feeding of netperf (Re: Mainline kernel OLTP performance update) Rick Jones
2009-01-23 18:51                                             ` Grant Grundler
2009-01-24  3:03                                             ` Zhang, Yanmin
2009-01-26 18:26                                               ` Rick Jones
2009-01-16  7:00                   ` Mainline kernel OLTP performance update Andrew Morton
2009-01-16  7:25                     ` Nick Piggin
2009-01-16  8:59                     ` Nick Piggin
2009-01-16 18:11                   ` Rick Jones
2009-01-19  7:43                     ` Nick Piggin
2009-01-19 22:19                       ` Rick Jones
2009-01-15 14:12         ` James Bottomley
2009-01-15 17:44           ` Andrew Morton
2009-01-15 18:00             ` Matthew Wilcox
2009-01-15 18:14               ` Steven Rostedt
2009-01-15 18:44                 ` Gregory Haskins
2009-01-15 18:46                   ` Wilcox, Matthew R
2009-01-15 19:44                     ` Ma, Chinang
2009-01-16 18:14                       ` Gregory Haskins
2009-01-16 19:09                         ` Steven Rostedt
2009-01-20 12:45                         ` Gregory Haskins
2009-01-15 19:28                 ` Ma, Chinang
2009-01-15 16:48       ` Ma, Chinang
  -- strict thread matches above, loose matches on Subject: below --
2010-01-25 18:26 Ma, Chinang
2009-05-04 15:54 Styner, Douglas W
2009-05-06  6:29 ` Anirban Chakraborty
2009-05-06 15:53   ` Wilcox, Matthew R
2009-05-06 18:05     ` Styner, Douglas W
2009-05-06 18:12       ` Wilcox, Matthew R
2009-05-06 18:24         ` Anirban Chakraborty
2009-05-06 19:25           ` Wilcox, Matthew R
2009-05-06 18:19   ` Styner, Douglas W
2009-04-28 17:22 Styner, Douglas W
2009-04-28 17:08 Styner, Douglas W
2009-04-29  7:29 ` Andrew Morton
2009-04-29  8:28   ` Andi Kleen
2009-04-29 16:00     ` Styner, Douglas W
2009-04-29 16:06       ` Wilcox, Matthew R
2009-04-29 16:19         ` Andi Kleen
2009-04-29 15:48   ` Styner, Douglas W
2009-04-29 16:07     ` Andrew Morton
2009-04-29 16:25       ` Peter Zijlstra
2009-04-29 17:46         ` Chris Mason
2009-04-29 18:06           ` Pallipadi, Venkatesh
2009-04-29 18:25             ` Styner, Douglas W
2009-04-29 17:52         ` Styner, Douglas W
2009-04-23 16:49 Styner, Douglas W
2009-04-27  7:02 ` Andi Kleen
2009-04-28 16:57   ` Chuck Ebbert
2009-04-28 17:15     ` James Bottomley
2009-04-28 17:17       ` Styner, Douglas W
2009-01-12 18:30 Ma, Chinang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).