linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Performance issue in 2.5.32+
@ 2002-09-08 13:19 rwhron
  0 siblings, 0 replies; 8+ messages in thread
From: rwhron @ 2002-09-08 13:19 UTC (permalink / raw)
  To: jmorris; +Cc: linux-kernel

> lmbench shows major changes in context switching latencies 
> on a dual celeron system 

Some lmbench tests vary a lot so I run it 25 times and 
average the results.  Some variations are only apparent on 
particular cpus (cache size differences, smp, etc).

The lmbench averaging script I use is:
http://home.earthlink.net/~rwhron/kernel/lmbsum

lmbench on a quad xeon and a bit on how to use lmbsum:
http://home.earthlink.net/~rwhron/kernel/bigbox.html#lmbench

Recent uniprocessor results:
http://home.earthlink.net/~rwhron/kernel/latest.html#lmbench

-- 
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance issue in 2.5.32+
  2002-09-08 18:09     ` Linus Torvalds
@ 2002-09-09 16:06       ` James Morris
  0 siblings, 0 replies; 8+ messages in thread
From: James Morris @ 2002-09-09 16:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Sun, 8 Sep 2002, Linus Torvalds wrote:

> I don't really see anythign very suspicious in the 31->32 changelogs. 
> Somebody else already mentioned TLS etc, but that shouldn't really be 
> noticeable...

I've now tracked the problem to the introduction of the new mtrr code, and
backing it out for 2.5.32 restores original performance.  From what I can 
tell, the new code is not handling a bios/hardware bug that the old code 
was.

The old mtrr code says this during bootup:
 mtrr: your CPUs had inconsistent fixed MTRR settings
 mtrr: your CPUs had inconsistent variable MTRR settings
 mtrr: probably your BIOS does not setup all CPUs

While the new code just says:
 mtrr: SMP support incomplete for this vendor

The peformance hit is basically the same as when the old mtrr code is 
disabled.  I'll have a poke around and see if I can find out why the new 
mtrr code isn't handling the problem.


- James
-- 
James Morris
<jmorris@intercode.com.au>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance issue in 2.5.32+
  2002-09-08 17:40   ` James Morris
@ 2002-09-08 18:09     ` Linus Torvalds
  2002-09-09 16:06       ` James Morris
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2002-09-08 18:09 UTC (permalink / raw)
  To: James Morris; +Cc: linux-kernel


On Mon, 9 Sep 2002, James Morris wrote:
> 
> I guess it could be an existing hardware problem which has been triggered 
> by some innocuous change.

Does HZ make any difference to you? That's easily tested by just editing 
include/asm-i386/param.h and changing HZ to 100 to get the old behaviour..

I think that change happened earlier than your performance problem, but 
since lmbench variations can be noticeable and you only had one run per 
kernel, it might be one of those "comes and goes" things..

Another thing that you might check out is to disable USB if you have it
on. The memory throughput issue makes me wonder whether there might be a
lot of DMA going on, and USB (by "design", and I use the term losely) has
this silly "DMA always"  approach, even if nothing really is happening on
the bus. 

I don't really see anythign very suspicious in the 31->32 changelogs. 
Somebody else already mentioned TLS etc, but that shouldn't really be 
noticeable...

		Linus


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance issue in 2.5.32+
  2002-09-08 15:45 ` Linus Torvalds
@ 2002-09-08 17:40   ` James Morris
  2002-09-08 18:09     ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: James Morris @ 2002-09-08 17:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Sun, 8 Sep 2002, Linus Torvalds wrote:

> That "something else" may be some process that is constantly running one
> one CPU or something.
> 
> Maybe something got confused by the kernel change, and is now getting
> stuck in the background? 

Not as far as I can tell.  The system is RH 7.3 with errata, and I've run
the tests again after shutting down everything except console login with
no change.  No processes seem to be running abnormally.

I guess it could be an existing hardware problem which has been triggered 
by some innocuous change.


- James
-- 
James Morris
<jmorris@intercode.com.au>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance issue in 2.5.32+
  2002-09-08 11:46 James Morris
  2002-09-08 11:57 ` Andi Kleen
@ 2002-09-08 15:45 ` Linus Torvalds
  2002-09-08 17:40   ` James Morris
  1 sibling, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2002-09-08 15:45 UTC (permalink / raw)
  To: jmorris, linux-kernel

This looks ridiculous, a 1000 us latency is one millisecond, which would
seem to imply that something is getting synchronized with the timer
interrupt for some reason.

You also have a few other outliers (TCP latency and file delete), AND
your memory performance also went down a lot.

Quite frankly, the memory performance thing has nothing to do with the
OS, so there's something else going on on your machine.  That "something
else" may be some process that is constantly running one one CPU or
something. 

Maybe something got confused by the kernel change, and is now getting
stuck in the background? 

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance issue in 2.5.32+
  2002-09-08 11:57 ` Andi Kleen
@ 2002-09-08 12:55   ` James Morris
  0 siblings, 0 replies; 8+ messages in thread
From: James Morris @ 2002-09-08 12:55 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On 8 Sep 2002, Andi Kleen wrote:

> 2.5.32 added the TLS changes, which do rewrite the GDT in context switch.
> Does it go away when you comment out the call to load_TLS in 
> arch/i386/kernel/process.c:__switch_to() ?

Nope, the lmbench results look roughly the same.


- James
-- 
James Morris
<jmorris@intercode.com.au>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Performance issue in 2.5.32+
  2002-09-08 11:46 James Morris
@ 2002-09-08 11:57 ` Andi Kleen
  2002-09-08 12:55   ` James Morris
  2002-09-08 15:45 ` Linus Torvalds
  1 sibling, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2002-09-08 11:57 UTC (permalink / raw)
  To: James Morris; +Cc: linux-kernel

James Morris <jmorris@intercode.com.au> writes:

> I've noticed a significant performance hit on my system since 2.5.32.  
> lmbench shows major changes in context switching latencies for 2.5.32 and
> 2.5.33 (see below, and profiling results for lat_ctx after that). This is
> on a dual celeron system with a Gigabyte GA6BXD motherboard.  I can
> provide more hardware and configuration details if required.

2.5.32 added the TLS changes, which do rewrite the GDT in context switch.
Does it go away when you comment out the call to load_TLS in 
arch/i386/kernel/process.c:__switch_to() ? (change should be harmless
unless you use a development glibc which uses TLS) 

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Performance issue in 2.5.32+
@ 2002-09-08 11:46 James Morris
  2002-09-08 11:57 ` Andi Kleen
  2002-09-08 15:45 ` Linus Torvalds
  0 siblings, 2 replies; 8+ messages in thread
From: James Morris @ 2002-09-08 11:46 UTC (permalink / raw)
  To: linux-kernel

I've noticed a significant performance hit on my system since 2.5.32.  
lmbench shows major changes in context switching latencies for 2.5.32 and
2.5.33 (see below, and profiling results for lat_ctx after that). This is
on a dual celeron system with a Gigabyte GA6BXD motherboard.  I can
provide more hardware and configuration details if required.


                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------


Basic system parameters
----------------------------------------------------
Host                 OS Description              Mhz
                                                    
--------- ------------- ----------------------- ----
test1      Linux 2.4.19       i686-pc-linux-gnu  548
test1      Linux 2.5.31       i686-pc-linux-gnu  547
test1      Linux 2.5.32       i686-pc-linux-gnu  547
test1      Linux 2.5.33       i686-pc-linux-gnu  547

Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                             call  I/O stat clos TCP   inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
test1      Linux 2.4.19  548 0.64 1.09 6.04 8.10  49.9 1.52 4.81 498. 1399 6546
test1      Linux 2.5.31  547 0.61 1.15 6.77 9.91  44.2 1.48 5.12 576. 1698 7381
test1      Linux 2.5.32  547 0.64 1.18 6.24 9.64       1.52 5.08 703. 1884 8014
test1      Linux 2.5.33  547 0.64 1.15 6.29 8.18       1.52 5.08 733. 1884 7965

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------
Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                        ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ----- ------ ------ ------ ------ ------- -------
test1      Linux 2.4.19 2.890   10.7   94.5   34.1  130.8    35.9   135.0
test1      Linux 2.5.31 7.160   12.8   64.4   21.8  131.7    38.4   133.6
test1      Linux 2.5.32 1.460 1085.4   95.1  333.1 1078.8   200.6   634.3
test1      Linux 2.5.33 2.080 9.4200   93.8   37.4 1096.0   212.3  1079.8

*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
test1      Linux 2.4.19 2.890  13.0 39.2              60.7       120.
test1      Linux 2.5.31 7.160 9.213 17.5              62.0       121.
test1      Linux 2.5.32 1.460  10.5 18.8              48.7       249.
test1      Linux 2.5.33 2.080  10.5 17.9             8018.       192.

File & VM system latencies in microseconds - smaller is better
--------------------------------------------------------------
Host                 OS   0K File      10K File      Mmap    Prot    Page	
                        Create Delete Create Delete  Latency Fault   Fault 
--------- ------------- ------ ------ ------ ------  ------- -----   ----- 
test1      Linux 2.4.19  117.2   36.4  316.4   73.6   1811.0 1.234 4.00000
test1      Linux 2.5.31  123.6   40.5  341.5   83.5   3120.0 1.316 5.00000
test1      Linux 2.5.32  125.4   41.4  25.6K 5988.0   3224.0 1.223 5.00000
test1      Linux 2.5.33  121.5   38.8  198.5   81.8   3078.0 1.327 5.00000

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
test1      Linux 2.4.19 164. 145. 78.8  216.1  443.3  155.3  154.5 443. 185.3
test1      Linux 2.5.31 215. 123. 84.9  235.9  440.5  158.8  157.9 440. 199.5
test1      Linux 2.5.32 203. 103. 6.75  199.5  388.8  143.7  142.0 387. 182.2
test1      Linux 2.5.33 130. 2.42 9.07  162.7  388.7  144.8  143.1 389. 182.8

Memory latencies in nanoseconds - smaller is better
    (WARNING - may not be correct, check graphs)
---------------------------------------------------
Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
--------- -------------  ---- ----- ------    --------    -------
test1      Linux 2.4.19   548 5.470  100.8  133.1
test1      Linux 2.5.31   547 5.478   87.0  133.5
test1      Linux 2.5.32   547 5.483  102.8  146.8
test1      Linux 2.5.33   547 5.483  110.8  146.5


* lat_ctx profiling data for 2.5.31:

"size=64k ovr=33.90
16 137.67
   868 default_idle                              13.5625
    24 system_call                                0.5455
    14 __wake_up                                  0.2917
    17 do_gettimeofday                            0.1181
     4 sys_gettimeofday                           0.0250
     1 pgd_alloc                                  0.0156
     2 pte_alloc_one                              0.0139
   938 total                                      0.0077
     6 schedule                                   0.0067
     1 do_page_fault                              0.0009
     1 copy_process                               0.0005

* lat_ctx profiling data for 2.5.32:

"size=64k ovr=33.85
16 629.19
  1329 default_idle                              20.7656
    22 system_call                                0.5000
    23 __wake_up                                  0.4792
     3 syscall_call                               0.2727
     1 syscall_exit                               0.0909
    13 do_gettimeofday                            0.0903
     8 sys_gettimeofday                           0.0500
     1 ret_from_intr                              0.0417
    33 schedule                                   0.0368
     2 remove_wait_queue                          0.0312
     1 schedule_tail                              0.0208
     2 pte_alloc_one                              0.0139
  1445 total                                      0.0117
     1 cpu_idle                                   0.0089
     1 session_of_pgrp                            0.0078
     1 do_softirq                                 0.0052
     1 sys_rt_sigaction                           0.0042
     1 do_sigaction                               0.0028
     1 copy_files                                 0.0014
     1 do_exit                                    0.0014



- James
-- 
James Morris
<jmorris@intercode.com.au>



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-09-09 16:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-08 13:19 Performance issue in 2.5.32+ rwhron
  -- strict thread matches above, loose matches on Subject: below --
2002-09-08 11:46 James Morris
2002-09-08 11:57 ` Andi Kleen
2002-09-08 12:55   ` James Morris
2002-09-08 15:45 ` Linus Torvalds
2002-09-08 17:40   ` James Morris
2002-09-08 18:09     ` Linus Torvalds
2002-09-09 16:06       ` James Morris

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).