All of lore.kernel.org
 help / color / mirror / Atom feed
* X11 performance regressions
@ 2011-05-08 18:22 Knut Petersen
  2011-05-09 16:53 ` Adam Jackson
  2011-05-09 21:43 ` Chris Wilson
  0 siblings, 2 replies; 14+ messages in thread
From: Knut Petersen @ 2011-05-08 18:22 UTC (permalink / raw)
  To: intel-gfx

I compared the performance of X11 on two otherwise idle machines.

Hardware
========
Both have
identical mainboards (Aopen i915GMm-hfs),
identical memory and BIOS setup.
Both cpus are Intel Pentium M mobile (Dothan).
One runs at 1.86 Mhz, the other runs at 2.00 MHz

Software
=======
1.86 MHz system:
opensuse 11.2
X.Org X Server 1.6.5
Release Date: 2009-10-11
kernel 2.6.38.5

2.00 MHz system:
opensuse 11.4
X.Org X Server 1.10.99
git-tree, 2011-may-7
kernel 2.6.39-rc4-drm-intel-staging

x11perf results
===========

The first line always gives the test result of the 2.00 Mhz system with the current Xorg,
the second line gives the test result of the 1.86 MHz sytem with Xorg 1.6.5. Read a
few representative examples:

10000000 trep @   0.0032 msec (309000.0/sec): Dot
40000000 trep @   0.0006 msec (1650000.0/sec): Dot

  45000 trep @   0.5973 msec (  1670.0/sec): 500x500 rectangle
 100000 trep @   0.4282 msec (  2340.0/sec): 500x500 rectangle

2000000 reps @   0.0034 msec (296000.0/sec): 1x1 stippled rectangle (8x8 stipple)
8000000 reps @   0.0007 msec (1420000.0/sec): 1x1 stippled rectangle (8x8 stipple)

   1500 trep @  22.4602 msec (    44.5/sec): 500x500 stippled rectangle (8x8 stipple)
   3000 trep @   9.2680 msec (   108.0/sec): 500x500 stippled rectangle (8x8 stipple)

 100000 trep @   0.4043 msec (  2470.0/sec): Fill 10x10 trapezoid
1000000 trep @   0.0336 msec ( 29700.0/sec): Fill 10x10 trapezoid

The old X on the PC with the slower cpu is always significantly faster than the current git code,
very often more than 5 times as fast, and a number of test show 1.6.5 to be more than 12 times
faster than 1.10.99.

I did not use any special configuration options at compile time
1.10.99 was built using the following commands.

export PREFIX=/home/knut/local
export PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig
export PATH=$PREFIX/bin:$PATH
export ACLOCAL="aclocal -I $PREFIX/share/aclocal"
export LD_LIBRARY_PATH=$PREFIX/lib
export PYTHONPATH=$PREFIX/lib/python2.7/site-packages
util/modular/build.sh -g $PREFIX

Could anybody please explain why the old server is so much faster?
Are there any compile time or runtime options that could/should be used?

cu,
 Knut

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-08 18:22 X11 performance regressions Knut Petersen
@ 2011-05-09 16:53 ` Adam Jackson
  2011-05-09 21:43 ` Chris Wilson
  1 sibling, 0 replies; 14+ messages in thread
From: Adam Jackson @ 2011-05-09 16:53 UTC (permalink / raw)
  To: Knut Petersen; +Cc: intel-gfx

On 5/8/11 2:22 PM, Knut Petersen wrote:

> Software
> =======
> 1.86 MHz system:
> opensuse 11.2
> X.Org X Server 1.6.5
> Release Date: 2009-10-11
> kernel 2.6.38.5
>
> 2.00 MHz system:
> opensuse 11.4
> X.Org X Server 1.10.99
> git-tree, 2011-may-7
> kernel 2.6.39-rc4-drm-intel-staging

I'd start by suspecting differences in .config for the kernel between 
the two, particularly since...

> 10000000 trep @   0.0032 msec (309000.0/sec): Dot
> 40000000 trep @   0.0006 msec (1650000.0/sec): Dot

Dot dispatch is so completely CPU-dominated that I suspect you're simply 
measuring CPU overhead somewhere else.  For example, if one of those 
kernels is built with spinlock debugging and the other isn't.

- ajax

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-08 18:22 X11 performance regressions Knut Petersen
  2011-05-09 16:53 ` Adam Jackson
@ 2011-05-09 21:43 ` Chris Wilson
  2011-05-11 14:46   ` Knut Petersen
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2011-05-09 21:43 UTC (permalink / raw)
  To: Knut Petersen, intel-gfx

As a point of comparison, here are the similar results with master of all
the various trees on my 1.6GHz N450 (Atom+PineView) [so not strictly an
apples-to-apples comparison, your CPU is about 4-5x faster, but PNV is
about 3-4x faster than the 915GM (clock-for-clock)]:

On Sun, 08 May 2011 20:22:21 +0200, Knut Petersen <Knut_Petersen@t-online.de> wrote:
> 10000000 trep @   0.0032 msec (309000.0/sec): Dot
> 40000000 trep @   0.0006 msec (1650000.0/sec): Dot
50000000 trep @   0.0005 msec (1830000.0/sec): Dot
*100000000 trep @   0.0003 msec (2900000.0/sec): Dot
 
>   45000 trep @   0.5973 msec (  1670.0/sec): 500x500 rectangle
>  100000 trep @   0.4282 msec (  2340.0/sec): 500x500 rectangle
100000 trep @   0.3210 msec (  3120.0/sec): 500x500 rectangle

> 2000000 reps @   0.0034 msec (296000.0/sec): 1x1 stippled rectangle (8x8 stipple)
> 8000000 reps @   0.0007 msec (1420000.0/sec): 1x1 stippled rectangle (8x8 stipple)
25000000 trep @   0.0011 msec (902000.0/sec): 1x1 stippled rectangle (8x8 stipple)
*30000000 trep @   0.0008 msec (1180000.0/sec): 1x1 stippled rectangle (8x8 stipple)

>    1500 trep @  22.4602 msec (    44.5/sec): 500x500 stippled rectangle (8x8 stipple)
>    3000 trep @   9.2680 msec (   108.0/sec): 500x500 stippled rectangle (8x8 stipple)
4000 trep @   6.8986 msec (   145.0/sec): 500x500 stippled rectangle (8x8 stipple)
*3500 trep @   7.0786 msec (   141.0/sec): 500x500 stippled rectangle (8x8 stipple)
 
>  100000 trep @   0.4043 msec (  2470.0/sec): Fill 10x10 trapezoid
> 1000000 trep @   0.0336 msec ( 29700.0/sec): Fill 10x10 trapezoid
2000000 trep @   0.0152 msec ( 65700.0/sec): Fill 10x10 trapezoid
*4000000 trep @   0.0064 msec (156000.0/sec): Fill 10x10 trapezoid

Hmm. My suspicion was that this was GEM-related regressions (the overhead
of dynamic buffer manager and relocations) along with various
optimizations for the common cases affecting the software fallback
dominated benchmarks selected above. And whilst there may some element of
that behind the regression you're observing, I don't think that is the
whole story and Adam is right to suggest to check that the systems are
indeed configured identically (wrt to debug and optimisation options).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-09 21:43 ` Chris Wilson
@ 2011-05-11 14:46   ` Knut Petersen
  2011-05-11 17:52     ` Chris Wilson
  2011-05-11 19:49     ` Adam Jackson
  0 siblings, 2 replies; 14+ messages in thread
From: Knut Petersen @ 2011-05-11 14:46 UTC (permalink / raw)
  To: intel-gfx

Yes, I made some mistakes during my first measurements.

Below find better results. They are made on the same machine,
with the same kernel, at the same speed, with the same x11perf
program, absolutely nothing changed.

I used x11perfcomp -ro and sorted the output, worst results for
the currrent git code first.

I think the numbers below are quite interesting ...

-Knut

System
======
AOpen i915GMm-hfs
Pentium M 2.00 MHz (Dothan) running @ 2MHz fixed frequency, no thermal throttling
2GB RAM

1: Xorg of openSuSE 11.2 (absolute numbers)
===========================================
X.Org X Server 1.6.5
Release Date: 2009-10-11
X Protocol Version 11, Revision 0
Build Operating System: openSUSE SUSE LINUX
Current Operating System: Linux linux-iffr 2.6.38.5-kape #10 PREEMPT Fri May 6 17:41:06 CEST 2011 i686
Build Date: 23 September 2010  03:43:55PM
Binaries, as distributed by openSuSE

2: Xorg, fresh from git 10 May 2011 (relative performance)
==========================================================
X.Org X Server 1.10.99.1
Release Date: unreleased
X Protocol Version 11, Revision 0
Build Operating System: Linux 2.6.39-rc4-drm-intel-staging+ i686
Current Operating System: Linux linux-iffr 2.6.38.5-kape #10 PREEMPT Fri May 6 17:41:06 CEST 2011 i686
Kernel command line: root=/dev/hda2 acpi_enforce_resources=lax drm.debug=0x0
Build Date: 10 May 2011  04:43:21PM
Compiled without any special options using build.sh

     1        2    Operation
--------  ------   ---------
  965000.0   0.016   10x10 wide rectangle outline
  164000.0   0.033   Fill 1x1 equivalent triangle
  152000.0   0.034   Fill 1x1 trapezoid
  175000.0   0.061   Fill 1x1 stippled trapezoid (161x145 stipple)
  174000.0   0.062   Fill 1x1 opaque stippled trapezoid (161x145 stipple)
  173000.0   0.062   Fill 1x1 opaque stippled trapezoid (17x15 stipple)
  173000.0   0.062   Fill 1x1 opaque stippled trapezoid (8x8 stipple)
  173000.0   0.062   Fill 1x1 stippled trapezoid (17x15 stipple)
  173000.0   0.062   Fill 1x1 stippled trapezoid (8x8 stipple)
  138000.0   0.073   Fill 1x1 tiled trapezoid (17x15 tile)
  136000.0   0.074   Fill 1x1 tiled trapezoid (161x145 tile)
  136000.0   0.074   Fill 1x1 tiled trapezoid (216x208 tile)
  137000.0   0.074   Fill 1x1 tiled trapezoid (4x4 tile)
    2670.0   0.088   100-pixel double-dashed ellipse
    4170.0   0.092   100-pixel dashed ellipse
   85300.0    0.11   Fill 10x10 opaque stippled trapezoid (161x145 stipple)
   85800.0    0.11   Fill 10x10 stippled trapezoid (161x145 stipple)
   76400.0    0.12   Fill 10x10 opaque stippled trapezoid (17x15 stipple)
   74800.0    0.12   Fill 10x10 stippled trapezoid (17x15 stipple)
   68800.0    0.13   Fill 10x10 opaque stippled trapezoid (8x8 stipple)
   67200.0    0.13   Fill 10x10 stippled trapezoid (8x8 stipple)
34800000.0    0.14   1-pixel solid circle
   42300.0    0.15   Fill 10x10 tiled trapezoid (161x145 tile)
   41900.0    0.15   Fill 10x10 tiled trapezoid (216x208 tile)
    4080.0    0.16   100-pixel wide double-dashed ellipse
   26800.0    0.16   500x500 rectangle outline
   38100.0    0.16   Fill 10x10 tiled trapezoid (17x15 tile)
   36700.0    0.16   Fill 10x10 tiled trapezoid (4x4 tile)
24700000.0    0.17   1-pixel line
22200000.0    0.17   1-pixel line segment
   27500.0    0.18   Fill 10x10 equivalent triangle
   28300.0    0.18   Fill 10x10 trapezoid
  190000.0    0.20   100x100 wide rectangle outline
    5910.0    0.23   Fill 300x300 trapezoid
  553000.0    0.24   Copy 10x10 from pixmap to pixmap
  292000.0    0.25   100-pixel line segment (3 kids)
   54600.0    0.25   10x10 rectangle outline
  281000.0    0.26   100-pixel line segment (2 kids)
 4670000.0    0.26   10-pixel horizontal line segment
  114000.0    0.27   Fill 1x1 aa trap
  198000.0    0.27   ShmPutImage 10x10 square
  265000.0    0.28   100-pixel line segment (1 kid)
 2980000.0    0.28   10-pixel dashed line
 2220000.0    0.28   10-pixel dashed segment
 2840000.0    0.28   10-pixel line
 2010000.0    0.28   10-pixel line segment
   21400.0    0.28   500-pixel circle
     763.0    0.28   Fill 100x100 tiled trapezoid (161x145 tile)
     632.0    0.28   Fill 100x100 tiled trapezoid (17x15 tile)
     572.0    0.28   Fill 100x100 tiled trapezoid (4x4 tile)
   15300.0    0.28   Fill 100x100 trapezoid
 3960000.0    0.29   100-pixel horizontal line segment
  299000.0    0.30   100-pixel dashed line
  273000.0    0.30   100-pixel dashed segment
  247000.0    0.30   100-pixel double-dashed segment
  274000.0    0.30   100-pixel line
  248000.0    0.30   100-pixel line segment
  820000.0    0.30   1-pixel circle
    5410.0    0.30   500-pixel filled ellipse
    2840.0    0.30   500-pixel solid circle
  272000.0    0.31   100-pixel double-dashed line
  130000.0    0.31   10-pixel partial ellipse
  154000.0    0.31   PutImage 10x10 square
 1090000.0    0.32   10x10 tiled rectangle (161x145 tile)
 1120000.0    0.32   10x10 tiled rectangle (216x208 tile)
   12400.0    0.32   Fill 100x100 equivalent triangle
 1220000.0    0.33   1x1 tiled rectangle (161x145 tile)
 1220000.0    0.33   1x1 tiled rectangle (17x15 tile)
 1220000.0    0.33   1x1 tiled rectangle (216x208 tile)
 1220000.0    0.33   1x1 tiled rectangle (4x4 tile)
    3540.0    0.33   500-pixel wide ellipse
     792.0    0.33   Fill 100x100 tiled trapezoid (216x208 tile)
   87200.0    0.33   Fill 2x1 aa trap
  552000.0    0.34   10x10 tiled rectangle (17x15 tile)
  263000.0    0.34   Fill 1x1 aa trap with 1 bit alpha
      88.4    0.34   Fill 300x300 tiled trapezoid (161x145 tile)
  125000.0    0.36   10-pixel ellipse
      71.5    0.38   Fill 300x300 tiled trapezoid (17x15 tile)
 1680000.0    0.39   100-pixel vertical line segment
   54200.0    0.39   100x100 rectangle outline
      65.0    0.39   Fill 300x300 tiled trapezoid (4x4 tile)
   33900.0    0.40   100-pixel circle
  147000.0    0.40   10x10 tiled rectangle (4x4 tile)
     103.0    0.41   500x500 tiled rectangle (4x4 tile)
   35300.0    0.42   100-pixel partial circle
    1780.0    0.42   100-pixel wide dashed ellipse
    3520.0    0.42   100x100 tiled rectangle (4x4 tile)
   56200.0    0.42   500-pixel line
    5200.0    0.42   500x500 wide rectangle outline
   11300.0    0.42   GetImage 10x10 square
      90.5    0.44   Fill 300x300 tiled trapezoid (216x208 tile)
   12900.0    0.45   100-pixel wide ellipse
   50800.0    0.45   500-pixel line segment
 1820000.0    0.46   10x10 rectangle
 1450000.0    0.46   1x1 opaque stippled rectangle (8x8 stipple)
    1570.0    0.46   ShmPutImage 500x500 square
   23800.0    0.47   100x100 tiled rectangle (17x15 tile)
    5730.0    0.47   Fill 100x100 opaque stippled trapezoid (161x145 stipple)
    5210.0    0.47   Fill 100x100 stippled trapezoid (161x145 stipple)
  122000.0    0.48   10-pixel partial circle
   78600.0    0.49   100x100 rectangle
 1860000.0    0.49   10-pixel vertical line segment
   54300.0    0.49   10x1 wide horizontal line segment
   54400.0    0.49   10x1 wide vertical line segment
 1420000.0    0.50   1x1 opaque stippled rectangle (17x15 stipple)
 1440000.0    0.50   1x1 stippled rectangle (17x15 stipple)
 1450000.0    0.50   1x1 stippled rectangle (8x8 stipple)
 1420000.0    0.51   1x1 stippled rectangle (161x145 stipple)
     691.0    0.51   500x500 tiled rectangle (17x15 tile)
    3330.0    0.52   100-pixel dashed circle
 1400000.0    0.52   1x1 opaque stippled rectangle (161x145 stipple)
 1830000.0    0.52   1x1 rectangle
 2330000.0    0.52   500-pixel horizontal line segment
    4020.0    0.52   Fill 100x100 opaque stippled trapezoid (17x15 stipple)
    2190.0    0.53   100-pixel double-dashed circle
 2300000.0    0.53   500-pixel vertical line segment
    2540.0    0.53   500-pixel wide circle
 1810000.0    0.53   Dot
   15300.0    0.54   100-pixel partial ellipse
   26100.0    0.54   10-pixel wide partial ellipse
     182.0    0.54   500x500 opaque stippled rectangle (17x15 stipple)
    3060.0    0.54   Fill 100x100 stippled trapezoid (17x15 stipple)
   15400.0    0.54   GetProperty
   15500.0    0.54   QueryPointer
    4150.0    0.56   100-pixel wide double-dashed circle
  105000.0    0.56   10-pixel circle
   10200.0    0.60   100-pixel ellipse
   10200.0    0.60   500x50 wide vertical line segment
     705.0    0.60   Fill 300x300 stippled trapezoid (161x145 stipple)
 1480000.0    0.60   Unmap window via parent (50 kids)
   10300.0    0.61   500x50 wide horizontal line segment
    2530.0    0.61   Fill 100x100 opaque stippled trapezoid (8x8 stipple)
     848.0    0.61   Fill 300x300 opaque stippled trapezoid (161x145 stipple)
   21700.0    0.61   ShmPutImage 100x100 square
    2240.0    0.62   Fill 100x100 stippled trapezoid (8x8 stipple)
     386.0    0.62   Fill 300x300 stippled trapezoid (17x15 stipple)
     551.0    0.63   Fill 300x300 opaque stippled trapezoid (17x15 stipple)
  130000.0    0.64   Fill 10x10 aa trap with 1 bit alpha
    4080.0    0.65   100x100 opaque stippled rectangle (17x15 stipple)
     296.0    0.65   500x500 stippled rectangle (161x145 stipple)
    2200.0    0.66   500-pixel ellipse
     341.0    0.67   500x500 opaque stippled rectangle (161x145 stipple)
    4610.0    0.67   500x50 wide line
   15200.0    0.68   Fill 1x1 aa trap with 4 bit alpha
     325.0    0.69   Fill 300x300 opaque stippled trapezoid (8x8 stipple)
 1650000.0    0.70   Unmap window via parent (200 kids)
    6750.0    0.71   100x100 opaque stippled rectangle (161x145 stipple)
   54800.0    0.71   10-pixel fill chord partial ellipse
    6290.0    0.73   100x100 stippled rectangle (161x145 stipple)
     175.0    0.74   500x500 stippled rectangle (17x15 stipple)
   12700.0    0.74   Fill 10x10 aa trap
     275.0    0.75   Fill 300x300 stippled trapezoid (8x8 stipple)
     109.0    0.76   500x500 opaque stippled rectangle (8x8 stipple)
 1130000.0    0.76   Circulate Unmapped window (200 kids)
   14500.0    0.78   Fill 10x10 aa trapezoid
   15300.0    0.78   Fill 1x1 aa trapezoid
   10200.0    0.78   Fill 2x10 aa trap
    9180.0    0.78   PutImage 100x100 square
   15100.0    0.79   100-pixel solid circle
   48000.0    0.80   10-pixel fill slice partial ellipse
   33400.0    0.80   10x1 wide line
    2350.0    0.80   500x500 rectangle
       0.5    0.80   PutImage XY 500x500 square
       0.5    0.80   ShmPutImage XY 500x500 square
   18400.0    0.81   100-pixel filled ellipse
    2590.0    0.82   100x100 opaque stippled rectangle (8x8 stipple)
    3900.0    0.82   100x100 stippled rectangle (17x15 stipple)
    6930.0    0.82   Fill 10x10 aa trap with 4 bit alpha
     927.0    0.83   500x500 tiled rectangle (216x208 tile)
  219000.0    0.86   10x10 opaque stippled rectangle (161x145 stipple)
  140000.0    0.86   Copy 10x10 from window to pixmap
   16300.0    0.87   100-pixel fill slice partial circle
   28300.0    0.87   100x100 tiled rectangle (216x208 tile)
   30900.0    0.87   10-pixel wide ellipse
   23400.0    0.87   10-pixel wide partial circle
     859.0    0.87   500x500 tiled rectangle (161x145 tile)
     462.0    0.87   PutImage 500x500 square
   17600.0    0.88   100-pixel fill chord partial circle
  145000.0    0.88   10x10 opaque stippled rectangle (8x8 stipple)
  143000.0    0.88   Copy 10x10 from pixmap to window
    6530.0    0.89   100-pixel wide partial circle
   28100.0    0.89   100x100 tiled rectangle (161x145 tile)
  138000.0    0.89   Composite 10x10 from pixmap to window
    1470.0    0.90   Fill 100x100 aa trap
    1350.0    0.90   Fill 100x100 aa trap with 4 bit alpha
    1930.0    0.90   GetImage XY 10x10 square
   14200.0    0.92   100x10 wide vertical line segment
    4460.0    0.92   Fill 100x100 aa trapezoid
   41700.0    0.93   10-pixel filled ellipse
     463.0    0.93   Fill 300x300 aa trap with 4 bit alpha
 1350000.0    0.93   Move window via parent (200 kids)
   14300.0    0.94   100x10 wide horizontal line segment
    4810.0    0.94   Fill 300x300 aa pre-added trapezoid
     476.0    0.94   Fill 300x300 aa trap
     110.0    0.95   500x500 stippled rectangle (8x8 stipple)
   16800.0    0.96   Fill 100x100 aa pre-added trapezoid
    1140.0    0.96   PutImage XY 10x10 square
 1570000.0    0.96   Resize unmapped window (4 kids)
   22100.0    0.97   100-pixel fill chord partial ellipse
  155000.0    0.97   Fill 10x10 aa pre-added trapezoid
    1040.0    0.97   Fill 2x100 aa trap
 1660000.0    0.97   Moved unmapped window (16 kids)
 1660000.0    0.97   Moved unmapped window (25 kids)
 1190000.0    0.97   Move window via parent (100 kids)
      11.8    0.97   PutImage XY 100x100 square
      11.4    0.97   ShmPutImage XY 100x100 square
 1670000.0    0.97   Unmap window via parent (100 kids)
  173000.0    0.98   10x10 opaque stippled rectangle (17x15 stipple)
  926000.0    0.98   Fill 1x1 aa pre-added trapezoid
     346.0    0.98   Fill 2x300 aa trap
   57600.0    0.98   Hide/expose window via popup (4 kids)
 1630000.0    0.98   Moved unmapped window (100 kids)
 1630000.0    0.98   Moved unmapped window (200 kids)
 1650000.0    0.98   Moved unmapped window (4 kids)
 1640000.0    0.98   Moved unmapped window (50 kids)
 1640000.0    0.98   Moved unmapped window (75 kids)
    1210.0    0.98   Scroll 500x500 pixels
     574.0    0.99   Copy 100x100 n-bit deep plane
     867.0    0.99   Copy 500x500 from pixmap to pixmap
      23.3    0.99   Copy 500x500 n-bit deep plane
      24.7    0.99   GetImage XY 100x100 square
   16600.0    0.99   Move window (200 kids)
 1560000.0    0.99   Resize unmapped window (16 kids)
 1530000.0    0.99   Resize unmapped window (200 kids)
 1560000.0    0.99   Resize unmapped window (25 kids)
 1550000.0    0.99   Resize unmapped window (50 kids)
    1050.0    0.99   ShmPutImage XY 10x10 square
  266000.0    1.00   Char in 30-char aa line (Charter 24)
  265000.0    1.00   Char in 30-char a line (Charter 24)
  508000.0    1.00   Char in 30-char image line (TR 24)
     869.0    1.00   Composite 500x500 from window to window
     870.0    1.00   Copy 500x500 from window to window
       1.0    1.00   GetImage XY 500x500 square
 1530000.0    1.00   Resize unmapped window (100 kids)
   20000.0    1.01   100-pixel fill slice partial ellipse
  231000.0    1.01   10x10 stippled rectangle (161x145 stipple)
   19900.0    1.01   Composite 100x100 from pixmap to window
   19600.0    1.01   Composite 100x100 from window to window
     851.0    1.01   Composite 500x500 from pixmap to window
   19800.0    1.01   Copy 100x100 from pixmap to pixmap
   19900.0    1.01   Copy 100x100 from pixmap to window
   20000.0    1.01   Copy 100x100 from window to pixmap
   19600.0    1.01   Copy 100x100 from window to window
     851.0    1.01   Copy 500x500 from pixmap to window
 1530000.0    1.01   Resize unmapped window (75 kids)
   10300.0    1.02   100-pixel wide circle
  108000.0    1.02   Char in 80-char rgb core line (Charter 10)
     849.0    1.02   Copy 500x500 from window to pixmap
  169000.0    1.03   10x10 stippled rectangle (17x15 stipple)
   20700.0    1.03   Move window (100 kids)
   26900.0    1.03   Scroll 100x100 pixels
  255000.0    1.04   Char16 in 23-char line (k24)
 1720000.0    1.04   Char in 80-char image line (TR 10)
   37200.0    1.04   Circulate window (4 kids)
   25200.0    1.04   Move window (25 kids)
   23200.0    1.04   Move window (50 kids)
   21900.0    1.04   Move window (75 kids)
    2540.0    1.05   100x100 stippled rectangle (8x8 stipple)
   41700.0    1.05   Copy 10x10 n-bit deep plane
   25800.0    1.05   Move window (16 kids)
  119000.0    1.06   Char in 80-char a core line (Charter 10)
 2010000.0    1.07   Circulate Unmapped window (100 kids)
 2250000.0    1.07   Circulate Unmapped window (75 kids)
   27600.0    1.07   Move window (4 kids)
  377000.0    1.07   Move window via parent (16 kids)
 1050000.0    1.07   Move window via parent (75 kids)
  626000.0    1.08   Char16 in 40-char line (k14)
  118000.0    1.08   Char in 80-char aa core line (Charter 10)
  534000.0    1.08   Move window via parent (25 kids)
  108000.0    1.08   Move window via parent (4 kids)
  471000.0    1.09   Char16 in 40-char image line (k14)
   23700.0    1.09   Resize window (200 kids)
 1030000.0    1.10   Char in 60-char image line (9x15)
 1400000.0    1.10   Char in 80-char image line (6x13)
 2610000.0    1.10   Circulate Unmapped window (50 kids)
     756.0    1.10   Fill 300x300 aa trapezoid
   30000.0    1.10   Resize window (100 kids)
  134000.0    1.11   10x10 stippled rectangle (8x8 stipple)
  833000.0    1.11   Char in 30-char line (TR 24)
   34800.0    1.11   Resize window (50 kids)
   32100.0    1.11   Resize window (75 kids)
  308000.0    1.11   Unmap window via parent (4 kids)
  176000.0    1.12   Char16 in 23-char image line (k24)
 1200000.0    1.12   Char in 70-char image line (8x13)
  634000.0    1.12   Char in 80-char rgb line (Courier 12)
  902000.0    1.12   Unmap window via parent (16 kids)
    5790.0    1.13   100-pixel wide partial ellipse
  314000.0    1.13   Char16 in 7/14/7 line (k14, k24)
 1570000.0    1.13   Char in 60-char line (9x15)
   39800.0    1.13   Resize window (16 kids)
   21100.0    1.14   Char in 30-char rgb core line (Charter 24)
 1940000.0    1.14   Char in 80-char line (6x13)
 3010000.0    1.14   Circulate Unmapped window (25 kids)
   37900.0    1.14   Resize window (25 kids)
   80700.0    1.15   Char in 80-char rgb core line (Courier 12)
   74000.0    1.15   Map window via parent (4 kids)
   42800.0    1.15   Resize window (4 kids)
 1140000.0    1.15   Unmap window via parent (25 kids)
 1610000.0    1.15   Unmap window via parent (75 kids)
   10700.0    1.16   100x10 wide line
 1750000.0    1.16   Char in 70-char line (8x13)
  822000.0    1.16   Move window via parent (50 kids)
   38100.0    1.17   10-pixel fill chord partial circle
   87400.0    1.18   Char in 80-char aa core line (Courier 12)
 2160000.0    1.18   Char in 80-char line (TR 10)
   19500.0    1.18   Circulate window (200 kids)
   22600.0    1.19   Char in 30-char aa core line (Charter 24)
   22600.0    1.19   Char in 30-char a core line (Charter 24)
  172000.0    1.19   Char in 30-char rgb line (Charter 24)
  185000.0    1.19   Destroy window via parent (4 kids)
   95200.0    1.19   Hide/expose window via popup (16 kids)
   35300.0    1.20   10-pixel fill slice partial circle
   21900.0    1.20   Circulate window (100 kids)
   25000.0    1.20   Circulate window (16 kids)
   87500.0    1.21   Char in 80-char a core line (Courier 12)
 3190000.0    1.21   Circulate Unmapped window (16 kids)
   24100.0    1.21   Circulate window (25 kids)
 8150000.0    1.21   X protocol NoOperation
  411000.0    1.22   Destroy window via parent (25 kids)
 1620000.0    1.23   Char in 20/40/20 line (6x13, TR 10)
   23100.0    1.23   Circulate window (50 kids)
   22500.0    1.23   Circulate window (75 kids)
 3430000.0    1.25   Circulate Unmapped window (4 kids)
  465000.0    1.25   Destroy window via parent (50 kids)
    3550.0    1.26   100x10 wide double-dashed line
   13500.0    1.27   Fill 100x100 64-gon (Convex)
     759.0    1.27   GetImage 100x100 square
  374000.0    1.28   Destroy window via parent (16 kids)
  774000.0    1.30   Char in 80-char a line (Courier 12)
  771000.0    1.32   Char in 80-char aa line (Courier 12)
  513000.0    1.32   Destroy window via parent (100 kids)
  108000.0    1.33   Hide/expose window via popup (50 kids)
  109000.0    1.33   Map window via parent (16 kids)
  761000.0    1.34   Char in 80-char rgb line (Charter 10)
   97400.0    1.34   Hide/expose window via popup (25 kids)
    1380.0    1.36   100-pixel wide dashed circle
  494000.0    1.38   Destroy window via parent (200 kids)
  490000.0    1.38   Destroy window via parent (75 kids)
  115000.0    1.38   Hide/expose window via popup (100 kids)
  112000.0    1.38   Hide/expose window via popup (75 kids)
    3030.0    1.39   100x10 wide dashed line
   12100.0    1.41   Fill 100x100 64-gon (Complex)
  117000.0    1.42   Hide/expose window via popup (200 kids)
  103000.0    1.43   Create and map subwindows (100 kids)
   80900.0    1.43   Create and map subwindows (4 kids)
  101000.0    1.47   Create and map subwindows (16 kids)
    9970.0    1.48   Fill 100x100 equivalent complex polygons
  120000.0    1.48   Map window via parent (50 kids)
  105000.0    1.49   Create and map subwindows (75 kids)
   29700.0    1.50   10-pixel solid circle
  139000.0    1.50   Change graphics context
  515000.0    1.50   Create unmapped window (200 kids)
  509000.0    1.50   Create unmapped window (50 kids)
  514000.0    1.50   Create unmapped window (75 kids)
  507000.0    1.51   Create unmapped window (16 kids)
  126000.0    1.51   Map window via parent (100 kids)
  110000.0    1.51   Map window via parent (25 kids)
   81700.0    1.52   Copy 10x10 from window to window
  102000.0    1.52   Create and map subwindows (25 kids)
  103000.0    1.52   Create and map subwindows (50 kids)
   81000.0    1.53   Composite 10x10 from window to window
  101000.0    1.53   Create and map subwindows (200 kids)
  126000.0    1.53   Map window via parent (200 kids)
   81300.0    1.53   Scroll 10x10 pixels
   27200.0    1.54   10-pixel wide circle
  122000.0    1.56   Map window via parent (75 kids)
   28300.0    1.57   Fill 100x100 aa trap with 1 bit alpha
  515000.0    1.59   Create unmapped window (100 kids)
  488000.0    1.66   Create unmapped window (25 kids)
  773000.0    1.75   Char in 80-char a line (Charter 10)
  766000.0    1.76   Char in 80-char aa line (Charter 10)
      29.4    1.89   GetImage 500x500 square
  413000.0    1.91   Create unmapped window (4 kids)
  107000.0    2.20   Copy 10x10 1-bit deep plane
     392.0    2.48   Copy 500x500 1-bit deep plane
    7040.0    2.59   Copy 100x100 1-bit deep plane
   25500.0    3.22   Fill 10x10 64-gon (Complex)
   25900.0    3.39   Fill 10x10 64-gon (Convex)
   26100.0    3.87   Fill 10x10 equivalent complex polygon
    5040.0    4.82   Fill 300x300 aa trap with 1 bit alpha

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-11 14:46   ` Knut Petersen
@ 2011-05-11 17:52     ` Chris Wilson
  2011-05-12  7:19       ` Knut Petersen
  2011-05-11 19:49     ` Adam Jackson
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2011-05-11 17:52 UTC (permalink / raw)
  To: Knut Petersen, intel-gfx

On Wed, 11 May 2011 16:46:12 +0200, Knut Petersen <Knut_Petersen@t-online.de> wrote:
> Yes, I made some mistakes during my first measurements.
> 
> Below find better results. They are made on the same machine,
> with the same kernel, at the same speed, with the same x11perf
> program, absolutely nothing changed.
> 
> I used x11perfcomp -ro and sorted the output, worst results for
> the currrent git code first.
> 
> I think the numbers below are quite interesting ...

>      1        2    Operation
> --------  ------   ---------
>   965000.0   0.016   10x10 wide rectangle outline

Something is still not quite right here. This should be mostly CPU bound,
and even my Atom gets 734k.

Can you check that (a) it is CPU bound and (b) the worst offenders
according to the system profiler of your choice (e.g. perf)?

Thanks for doing this investigation.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-11 14:46   ` Knut Petersen
  2011-05-11 17:52     ` Chris Wilson
@ 2011-05-11 19:49     ` Adam Jackson
  2011-05-11 21:22       ` Knut Petersen
  1 sibling, 1 reply; 14+ messages in thread
From: Adam Jackson @ 2011-05-11 19:49 UTC (permalink / raw)
  To: Knut Petersen; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 5319 bytes --]

On Wed, 2011-05-11 at 16:46 +0200, Knut Petersen wrote:
> Yes, I made some mistakes during my first measurements.
> 
> Below find better results. They are made on the same machine,
> with the same kernel, at the same speed, with the same x11perf
> program, absolutely nothing changed.

You don't mention whether the 2d driver varies; I assume it does at
least to the extent of rebuilding for new ABI.  Or libdrm, although
that's really a 1% kind of thing.

> I think the numbers below are quite interesting ...

I still wager they're more about the environment than about the driver
proper, there's just too many weird things going on in your results. For
example:

>   198000.0    0.27   ShmPutImage 10x10 square
>     1570.0    0.46   ShmPutImage 500x500 square
>    21700.0    0.61   ShmPutImage 100x100 square

This is essentially a memcpy benchmark.  Something has to be very wrong
for that much variation to happen, and my guess would be something like
failing to inline memcpy or pick sufficiently macho optimized versions.
I'd be interested to see what your CFLAGS from build.sh ended up being,
relative to what opensuse gives for 'rpm --eval "%{optflags}"'.

One cool thing you can do from memcpy benchmarks like this is
extrapolate a bandwidth number. Your fast numbers are (small tests to
big) 75.5, 828, and 1497 MB/s. Normally one expects some growth in those
numbers for bigger tests, but typically the jump from 10x10 to 100x100
is a bit larger than the jump from 100x100 to 500x500.

So that hints that small-work tests are being choked somehow. Recall
that x11perf does a 1-pixel GetImage periodically in order to guarantee
that results actually hit the framebuffer instead of just being queued
in the command stream, so round-trip performance with the X server does
actually matter. More than that, small-work requests (which take less
time) would be more strongly dominated by round-trip speed than
large-work requests. Given that:

>    15400.0    0.54   GetProperty
>    15500.0    0.54   QueryPointer

is very telling. Those requests do essentially no work, but they are
round-trips, and their throughput is thus bounded mostly by how long it
takes the scheduler to ping-pong between x11perf and the server. A
factor of ~2 drop would lead me to suspect something like one kernel
scheduling the processes on different cores, and the other both on the
same core; two processes splitting 1CPU time with maybe a little cache
warmth between them would intuitively be about half as fast as two
processes each with their own CPU.

Empirical evidence: On the Ironlake laptop on my desk (kernel
2.6.38.3-18.fc15), if I use taskset to bind the X server to CPU0,
running "x11perf -prop -pointer" bound to CPU0 gives:

 300000 trep @   0.0322 msec ( 31100.0/sec): QueryPointer
 300000 trep @   0.0321 msec ( 31200.0/sec): GetProperty

x11perf bound to CPU3 gives:

 600000 trep @   0.0193 msec ( 51900.0/sec): QueryPointer
 600000 trep @   0.0192 msec ( 52200.0/sec): GetProperty

And running it unbound (letting the scheduler decide) gives:

 600000 trep @   0.0198 msec ( 50600.0/sec): QueryPointer
 600000 trep @   0.0208 msec ( 48000.0/sec): GetProperty

I'd be curious to see how you fare with experimenting with taskset.

One set of results that's a little confusing, and thus probably in the
end most enlightening:

>   553000.0    0.24   Copy 10x10 from pixmap to pixmap
>   140000.0    0.86   Copy 10x10 from window to pixmap
>   143000.0    0.88   Copy 10x10 from pixmap to window
>      867.0    0.99   Copy 500x500 from pixmap to pixmap
>      870.0    1.00   Copy 500x500 from window to window
>    19800.0    1.01   Copy 100x100 from pixmap to pixmap
>    19900.0    1.01   Copy 100x100 from pixmap to window
>    20000.0    1.01   Copy 100x100 from window to pixmap
>    19600.0    1.01   Copy 100x100 from window to window
>      851.0    1.01   Copy 500x500 from pixmap to window
>      849.0    1.02   Copy 500x500 from window to pixmap
>    81700.0    1.52   Copy 10x10 from window to window

This _mostly_ makes sense. These are all just varying calls to
XCopyArea, which does not have a reply. The medium and large ops are
approximately identical before and after. The 0.8x results make sense in
the context of scheduling funniness for small-work requests. But the two
outliers are perplexing. I would guess that copywinwin10 got faster due
to some optimization surrounding buffer reuse or flush reduction (you're
always working on the same buffer, so you can do less work), and that
copypixpix10 is operating wholly in host memory for some reason and
therefore hitting the same kind of memcpy issue as in your ShmPutImage
results.

I'll also note that the paths where you're losing hardest are, in the
majority, things that the driver makes no attempt to accelerate
(anything with the word "tiled" or "stippled" involved, for example). I
would tend to chalk that up to something like gcc -O0 before anything
else since you're primarily measuring the efficiency of the software
renderer. I'm actually pretty pleased with the results you've shown, 10%
or better speedup for basically all text ops, about half of window
management ops, and almost all window exposure ops. 

- ajax

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-11 19:49     ` Adam Jackson
@ 2011-05-11 21:22       ` Knut Petersen
  2011-05-12 13:42         ` Adam Jackson
  0 siblings, 1 reply; 14+ messages in thread
From: Knut Petersen @ 2011-05-11 21:22 UTC (permalink / raw)
  To: Adam Jackson; +Cc: intel-gfx

As I do have only a few minutes now, a few comments:

1: The complete trees are compared, all modules/libraries are either old or new. No debug-versions.

2: Speculating about cores is definitely wrong -- the Pentium M Dothan definitely is a single core cpu.

3. There often is a "choked most" (1) -- "choked least" (10) -- "choked a bit more again" (100,500)
     result:

   1450000.0    0.50   1x1 stippled rectangle (8x8 stipple)
    134000.0    1.11   10x10 stippled rectangle (8x8 stipple)
      2540.0    1.05   100x100 stippled rectangle (8x8 stipple)
       110.0    0.95   500x500 stippled rectangle (8x8 stipple)

    Heavy per call impact of factor A on those small requests, light impact of a factor B with growing numbers?
    A = compiler / library overhead?

Yes, there is
>    15400.0    0.54   GetProperty
>    15500.0    0.54   QueryPointer
but we also see
 
     8150000.0    1.21   X protocol NoOperation


4. No, it's not the kernel. I did
    a) boot
    b) x11perf on old X
    c) x11perf on new X
    d) reboot
    e) x11perf on new X
    f) x11perf on old X
    and saw only very marginal differences between those two runs.

5.  Yes,  I do agree to that:
>  I'm actually pretty pleased with the results you've shown, 10%
> or better speedup for basically all text ops, about half of window
> management ops, and almost all window exposure ops. 
6. More later.

cu,
 Knut

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-11 17:52     ` Chris Wilson
@ 2011-05-12  7:19       ` Knut Petersen
  2011-05-12  7:38         ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Knut Petersen @ 2011-05-12  7:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


>>      1        2    Operation
>> --------  ------   ---------
>>   965000.0   0.016   10x10 wide rectangle outline
> Something is still not quite right here. This should be mostly CPU bound,
> and even my Atom gets 734k.
>
> Can you check that (a) it is CPU bound and (b) the worst offenders
> according to the system profiler of your choice (e.g. perf)?
>

734k would be nice ;-)

With current git Xorg its 10300 reps at 800 MHz and 16300 reps at 2000 MHz.
Increasing cpu clock by a factor of 2.5 increases reps by a factor of 1.58.

cu,
 knut

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-12  7:19       ` Knut Petersen
@ 2011-05-12  7:38         ` Chris Wilson
  2011-05-12  8:24           ` Knut Petersen
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2011-05-12  7:38 UTC (permalink / raw)
  To: Knut Petersen; +Cc: intel-gfx

On Thu, 12 May 2011 09:19:39 +0200, Knut Petersen <Knut_Petersen@t-online.de> wrote:
> 
> >>      1        2    Operation
> >> --------  ------   ---------
> >>   965000.0   0.016   10x10 wide rectangle outline
> > Something is still not quite right here. This should be mostly CPU bound,
> > and even my Atom gets 734k.
> >
> > Can you check that (a) it is CPU bound and (b) the worst offenders
> > according to the system profiler of your choice (e.g. perf)?
> >
> 
> 734k would be nice ;-)
> 
> With current git Xorg its 10300 reps at 800 MHz and 16300 reps at 2000 MHz.
> Increasing cpu clock by a factor of 2.5 increases reps by a factor of 1.58.

Please do something like 'perf record -f -g -a x11perf -d :0 -worect10;
perf report | head -150' and paste the output.
-Chris
> 
> cu,
>  knut

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-12  7:38         ` Chris Wilson
@ 2011-05-12  8:24           ` Knut Petersen
  2011-05-12  8:55             ` Chris Wilson
  0 siblings, 1 reply; 14+ messages in thread
From: Knut Petersen @ 2011-05-12  8:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 168 bytes --]


> Please do something like 'perf record -f -g -a x11perf -d :0 -worect10;
> perf report | head -150' and paste the output.
> -Chris
>
Attached find the perf log

Knut

[-- Attachment #2: perflog --]
[-- Type: text/plain, Size: 10526 bytes --]

# Events: 19K cycles
#
# Overhead          Command                    Shared Object                                                                                                                                                                                                                                                         Symbol
# ........  ...............  ...............................  .............................................................................................................................................................................................................................................................
#
    32.09%             Xorg  libpixman-1.so.0.23.1            [.] pixman_op
                       |
                       --- pixman_op
                          |          
                          |--99.80%-- pixman_region_union
                          |          |          
                          |          |--99.95%-- damageRegionAppend
                          |          |          damageDamageBox
                          |          |          damagePolyRectangle
                          |          |          ProcPolyRectangle
                          |          |          Dispatch
                          |          |          main
                          |          |          __libc_start_main
                          |           --0.05%-- [...]
                           --0.20%-- [...]

     5.98%             Xorg  libc-2.11.3.so                   [.] __GI_memmove
                       |
                       --- __GI_memmove
                          |          
                          |--93.46%-- pixman_region_union
                          |          damageRegionAppend
                          |          damageDamageBox
                          |          damagePolyRectangle
                          |          ProcPolyRectangle
                          |          Dispatch
                          |          main
                          |          __libc_start_main
                          |          
                          |--5.14%-- Dispatch
                          |          main
                          |          __libc_start_main
                          |          
                          |--1.22%-- WriteEventsToClient
                          |          DamageExtNotify
                          |          .L312
                          |          damageRegionProcessPending
                          |          damagePolyRectangle
                          |          ProcPolyRectangle
                          |          Dispatch
                          |          main
                          |          __libc_start_main
                           --0.18%-- [...]

     3.25%             Xorg  [kernel.kallsyms]                [k] __lock_acquire
                       |
                       --- __lock_acquire
                          |          
                          |--98.72%-- lock_acquire
                          |          |          
                          |          |--48.51%-- _raw_spin_lock_irqsave
                          |          |          |          
                          |          |          |--45.74%-- add_wait_queue
                          |          |          |          __pollwait
                          |          |          |          |          
                          |          |          |          |--89.24%-- unix_poll
                          |          |          |          |          sock_poll
                          |          |          |          |          do_select
                          |          |          |          |          core_sys_select
                          |          |          |          |          sys_select
                          |          |          |          |          sysenter_do_call
                          |          |          |          |          0xb76ed424
                          |          |          |          |          Dispatch
                          |          |          |          |          main
                          |          |          |          |          __libc_start_main
                          |          |          |          |          
                          |          |          |          |--4.40%-- n_tty_poll
                          |          |          |          |          tty_poll
                          |          |          |          |          do_select
                          |          |          |          |          core_sys_select
                          |          |          |          |          sys_select
                          |          |          |          |          sysenter_do_call
                          |          |          |          |          0xb76ed424
                          |          |          |          |          Dispatch
                          |          |          |          |          main
                          |          |          |          |          __libc_start_main
                          |          |          |          |          
                          |          |          |          |--3.56%-- datagram_poll
                          |          |          |          |          sock_poll
                          |          |          |          |          do_select
                          |          |          |          |          core_sys_select
                          |          |          |          |          sys_select
                          |          |          |          |          sysenter_do_call
                          |          |          |          |          0xb76ed424
                          |          |          |          |          Dispatch
                          |          |          |          |          main
                          |          |          |          |          __libc_start_main
                          |          |          |          |          
                          |          |          |           --2.81%-- drm_poll
                          |          |          |                     do_select
                          |          |          |                     core_sys_select
                          |          |          |                     sys_select
                          |          |          |                     sysenter_do_call
                          |          |          |                     0xb76ed424
                          |          |          |                     Dispatch
                          |          |          |                     main
                          |          |          |                     __libc_start_main
                          |          |          |          
                          |          |          |--31.44%-- remove_wait_queue
                          |          |          |          poll_freewait
                          |          |          |          do_select
                          |          |          |          core_sys_select
                          |          |          |          sys_select
                          |          |          |          sysenter_do_call
                          |          |          |          0xb76ed424
                          |          |          |          Dispatch
                          |          |          |          main
                          |          |          |          __libc_start_main
                          |          |          |          
                          |          |          |--6.96%-- skb_dequeue
                          |          |          |          unix_stream_recvmsg
                          |          |          |          sock_aio_read
                          |          |          |          do_sync_read
                          |          |          |          vfs_read
                          |          |          |          sys_read
                          |          |          |          sysenter_do_call
                          |          |          |          0xb76ed424
                          |          |          |          _XSERVTransRead
                          |          |          |          ReadRequestFromClient
                          |          |          |          Dispatch
                          |          |          |          main
                          |          |          |          __libc_start_main
                          |          |          |          
                          |          |          |--6.55%-- __wake_up_sync_key
                          |          |          |          |          
                          |          |          |          |--79.99%-- unix_write_space
                          |          |          |          |          sock_wfree
                          |          |          |          |          unix_destruct_scm
                          |          |          |          |          skb_release_head_state
                          |          |          |          |          __kfree_skb
                          |          |          |          |          consume_skb
                          |          |          |          |          unix_stream_recvmsg
                          |          |          |          |          sock_aio_read
                          |          |          |          |          do_sync_read
                          |          |          |          |          vfs_read
                          |          |          |          |          sys_read
                          |          |          |          |          sysenter_do_call
                          |          |          |          |          0xb76ed424
                          |          |          |          |          _XSERVTransRead
                          |          |          |          |          ReadRequestFromClient
                          |          |          |          |          Dispatch
                          |          |          |          |          main
                          |          |          |          |          __libc_start_main
                          |          |          |          |          
                          |          |          |           --20.01%-- sock_def_readable

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-12  8:24           ` Knut Petersen
@ 2011-05-12  8:55             ` Chris Wilson
  2011-05-12  9:34               ` Knut Petersen
  2011-05-13  9:24               ` Knut Petersen
  0 siblings, 2 replies; 14+ messages in thread
From: Chris Wilson @ 2011-05-12  8:55 UTC (permalink / raw)
  To: Knut Petersen; +Cc: intel-gfx

On Thu, 12 May 2011 10:24:00 +0200, Knut Petersen <Knut_Petersen@t-online.de> wrote:
> 
> > Please do something like 'perf record -f -g -a x11perf -d :0 -worect10;
> > perf report | head -150' and paste the output.
> > -Chris
> >
> Attached find the perf log

Oh, damage. A compositing WM? If you turn off compositing, do you see
similar performance levels to xorg-1.6?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-12  8:55             ` Chris Wilson
@ 2011-05-12  9:34               ` Knut Petersen
  2011-05-13  9:24               ` Knut Petersen
  1 sibling, 0 replies; 14+ messages in thread
From: Knut Petersen @ 2011-05-12  9:34 UTC (permalink / raw)
  To: intel-gfx


> Oh, damage. A compositing WM? If you turn off compositing, do you see
> similar performance levels to xorg-1.6?
> -Chris
>

That makes difference .... 16.300 reps speed up to 1.280.000 reps ... 78.5 times faster.

I think I will  rerun the tests.

cu,
 Knut

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-11 21:22       ` Knut Petersen
@ 2011-05-12 13:42         ` Adam Jackson
  0 siblings, 0 replies; 14+ messages in thread
From: Adam Jackson @ 2011-05-12 13:42 UTC (permalink / raw)
  To: Knut Petersen; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 417 bytes --]

On Wed, 2011-05-11 at 23:22 +0200, Knut Petersen wrote:

> Yes, there is
> >    15400.0    0.54   GetProperty
> >    15500.0    0.54   QueryPointer
> but we also see
>  
>      8150000.0    1.21   X protocol NoOperation

NoOp isn't a round trip, it does not generate a reply.  That test
measures how fast the X server can zip around its own main loop, not how
fast it can interact with clients.

- ajax

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: X11 performance regressions
  2011-05-12  8:55             ` Chris Wilson
  2011-05-12  9:34               ` Knut Petersen
@ 2011-05-13  9:24               ` Knut Petersen
  1 sibling, 0 replies; 14+ messages in thread
From: Knut Petersen @ 2011-05-13  9:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

[-- Attachment #1: Type: text/plain, Size: 326 bytes --]


> Oh, damage. A compositing WM? If you turn off compositing, do you see
> similar performance levels to xorg-1.6?
> -Chris
>

If "Composite" is disabled, the current X scores much better than the 1.6.5 server
in most cases. But there are a few exceptions ... for the worst of those cases, I
also attached a perf log.

- Knut

[-- Attachment #2: x11perfcomp --]
[-- Type: text/plain, Size: 20937 bytes --]

  1: x11perf-10605000-nocomposite
  2: x11perf-11099001-nocomposite
       1        2    Operation
  --------  ------   ---------
    2630.0    0.12   100-pixel double-dashed ellipse 
    4180.0    0.14   100-pixel dashed ellipse 
  575000.0    0.23   Copy 10x10 from pixmap to pixmap 
    5850.0    0.34   500-pixel filled ellipse 
    2970.0    0.35   500-pixel solid circle 
    6250.0    0.35   Fill 300x300 trapezoid 
  149000.0    0.41   PutImage 10x10 square 
    3930.0    0.44   100-pixel wide double-dashed ellipse 
  189000.0    0.44   ShmPutImage 10x10 square 
    1570.0    0.46   ShmPutImage 500x500 square 
    9610.0    0.49   GetImage 10x10 square 
   21700.0    0.51   ShmPutImage 100x100 square 
   12600.0    0.63   QueryPointer 
   12600.0    0.65   GetProperty 
  220000.0    0.67   100x100 wide rectangle outline 
   83400.0    0.68   100x100 rectangle 
     477.0    0.69   PutImage 500x500 square 
    9100.0    0.71   PutImage 100x100 square 
   28700.0    0.73   500x500 rectangle outline 
    5570.0    0.75   500x500 wide rectangle outline 
    2140.0    0.79   100-pixel double-dashed circle 
    2550.0    0.81   500-pixel wide circle 
 1690000.0    0.82   100-pixel vertical line segment 
    3500.0    0.82   500-pixel wide ellipse 
    3430.0    0.85   100-pixel dashed circle 
  163000.0    0.85   Fill 1x1 equivalent triangle 
  152000.0    0.86   Fill 1x1 trapezoid 
  139000.0    0.88   Copy 10x10 from window to pixmap 
  137000.0    0.91   Composite 10x10 from pixmap to window 
    1930.0    0.91   GetImage XY 10x10 square 
   21300.0    0.93   500-pixel circle 
  138000.0    0.93   Copy 10x10 from pixmap to window 
 1300000.0    0.93   Move window via parent (100 kids) 
 1370000.0    0.93   Move window via parent (200 kids) 
  130000.0    0.95   10-pixel partial ellipse 
  107000.0    0.95   Char in 80-char rgb core line (Charter 10) 
  831000.0    0.96   1-pixel circle 
   16700.0    0.96   Fill 100x100 aa pre-added trapezoid 
    1590.0    0.96   Fill 100x100 aa trap 
    1460.0    0.96   Fill 100x100 aa trap with 4 bit alpha 
     513.0    0.96   Fill 300x300 aa trap 
     499.0    0.96   Fill 300x300 aa trap with 4 bit alpha 
      74.9    0.96   Fill 300x300 tiled trapezoid (17x15 tile) 
  153000.0    0.97   Fill 10x10 aa pre-added trapezoid 
    4780.0    0.97   Fill 300x300 aa pre-added trapezoid 
      12.0    0.97   ShmPutImage XY 100x100 square 
     783.0    0.98   Fill 100x100 tiled trapezoid (161x145 tile) 
  927000.0    0.98   Fill 1x1 aa pre-added trapezoid 
      12.4    0.98   PutImage XY 100x100 square 
    1230.0    0.98   PutImage XY 10x10 square 
    1110.0    0.98   ShmPutImage XY 10x10 square 
   29000.0    0.99   100x100 tiled rectangle (161x145 tile) 
   23900.0    0.99   100x100 tiled rectangle (17x15 tile) 
   30100.0    0.99   100x100 tiled rectangle (216x208 tile) 
34800000.0    0.99   1-pixel solid circle 
     885.0    0.99   500x500 tiled rectangle (161x145 tile) 
     691.0    0.99   500x500 tiled rectangle (17x15 tile) 
     960.0    0.99   500x500 tiled rectangle (216x208 tile) 
  274000.0    0.99   Char in 30-char aa line (Charter 24) 
  275000.0    0.99   Char in 30-char a line (Charter 24) 
   20400.0    0.99   Copy 100x100 from pixmap to pixmap 
     599.0    0.99   Copy 100x100 n-bit deep plane 
     870.0    0.99   Copy 500x500 from pixmap to pixmap 
      24.3    0.99   Copy 500x500 n-bit deep plane 
  120000.0    0.99   Fill 1x1 aa trap 
    1090.0    0.99   Fill 2x100 aa trap 
   10700.0    0.99   Fill 2x10 aa trap 
   91200.0    0.99   Fill 2x1 aa trap 
  322000.0    1.00   100-pixel dashed line 
  307000.0    1.00   100-pixel double-dashed line 
  275000.0    1.00   100-pixel double-dashed segment 
  307000.0    1.00   100-pixel line 
  277000.0    1.00   100-pixel line segment 
  309000.0    1.00   100-pixel line segment (2 kids) 
24600000.0    1.00   1-pixel line 
 2430000.0    1.00   500-pixel horizontal line segment 
   56500.0    1.00   500-pixel line segment 
 2400000.0    1.00   500-pixel vertical line segment 
    2440.0    1.00   500x500 rectangle 
   20400.0    1.00   Composite 100x100 from pixmap to window 
   20100.0    1.00   Composite 100x100 from window to window 
     866.0    1.00   Composite 500x500 from pixmap to window 
     875.0    1.00   Composite 500x500 from window to window 
   20500.0    1.00   Copy 100x100 from pixmap to window 
   20600.0    1.00   Copy 100x100 from window to pixmap 
   20100.0    1.00   Copy 100x100 from window to window 
     866.0    1.00   Copy 500x500 from pixmap to window 
     872.0    1.00   Copy 500x500 from window to pixmap 
     875.0    1.00   Copy 500x500 from window to window 
     661.0    1.00   Fill 100x100 tiled trapezoid (17x15 tile) 
      67.1    1.00   Fill 300x300 tiled trapezoid (4x4 tile) 
      25.9    1.00   GetImage XY 100x100 square 
       1.0    1.00   GetImage XY 500x500 square 
       0.5    1.00   PutImage XY 500x500 square 
    1240.0    1.00   Scroll 500x500 pixels 
       0.5    1.00   ShmPutImage XY 500x500 square 
  289000.0    1.01   100-pixel dashed segment 
  292000.0    1.01   100-pixel line segment (1 kid) 
   13400.0    1.01   Fill 10x10 aa trap 
   28500.0    1.01   Scroll 100x100 pixels 
 3230000.0    1.02   10-pixel line 
  193000.0    1.02   Char16 in 23-char image line (k24) 
  271000.0    1.02   Char16 in 23-char line (k24) 
    7270.0    1.02   Fill 10x10 aa trap with 4 bit alpha 
 2350000.0    1.03   10-pixel dashed segment 
 2200000.0    1.03   10-pixel line segment 
   61800.0    1.03   500-pixel line 
  497000.0    1.03   Char16 in 40-char image line (k14) 
  118000.0    1.03   Char in 80-char aa core line (Charter 10) 
  118000.0    1.03   Char in 80-char a core line (Charter 10) 
     353.0    1.03   Fill 2x300 aa trap 
 1130000.0    1.03   Move window via parent (75 kids) 
  320000.0    1.04   100-pixel line segment (3 kids) 
  649000.0    1.04   Char16 in 40-char line (k14) 
  507000.0    1.04   Char in 30-char image line (TR 24) 
 1480000.0    1.04   Char in 80-char image line (6x13) 
     568.0    1.04   Fill 100x100 tiled trapezoid (4x4 tile) 
   15400.0    1.04   Fill 1x1 aa trap with 4 bit alpha 
 1090000.0    1.05   Char in 60-char image line (9x15) 
 1290000.0    1.05   Char in 70-char image line (8x13) 
 1720000.0    1.05   Char in 80-char image line (TR 10) 
   58900.0    1.05   Hide/expose window via popup (4 kids) 
 1810000.0    1.05   Moved unmapped window (100 kids) 
 1700000.0    1.05   Resize unmapped window (200 kids) 
  327000.0    1.06   Char16 in 7/14/7 line (k14, k24) 
   43200.0    1.06   Copy 10x10 n-bit deep plane 
 3160000.0    1.07   10-pixel dashed line 
 2050000.0    1.07   Char in 80-char line (6x13) 
  270000.0    1.07   Fill 1x1 aa trap with 1 bit alpha 
 1830000.0    1.07   Moved unmapped window (50 kids) 
21600000.0    1.08   1-pixel line segment 
   21200.0    1.08   Char in 30-char rgb core line (Charter 24) 
 1850000.0    1.08   Moved unmapped window (16 kids) 
 1830000.0    1.08   Moved unmapped window (4 kids) 
 1820000.0    1.08   Moved unmapped window (75 kids) 
  839000.0    1.09   Char in 30-char line (TR 24) 
 1620000.0    1.09   Char in 60-char line (9x15) 
 1870000.0    1.09   Char in 70-char line (8x13) 
 2340000.0    1.09   Char in 80-char line (TR 10) 
 1810000.0    1.09   Moved unmapped window (200 kids) 
   25600.0    1.09   Move window (25 kids) 
 1720000.0    1.09   Resize unmapped window (16 kids) 
 1720000.0    1.09   Resize unmapped window (25 kids) 
 1730000.0    1.09   Resize unmapped window (4 kids) 
 1220000.0    1.10   1x1 tiled rectangle (161x145 tile) 
 1220000.0    1.10   1x1 tiled rectangle (17x15 tile) 
 1210000.0    1.10   1x1 tiled rectangle (4x4 tile) 
     725.0    1.10   Fill 300x300 aa trapezoid 
   26200.0    1.10   Move window (16 kids) 
   16600.0    1.10   Move window (200 kids) 
 1700000.0    1.10   Resize unmapped window (75 kids) 
 1970000.0    1.10   Unmap window via parent (200 kids) 
 1750000.0    1.10   Unmap window via parent (50 kids) 
 1120000.0    1.11   10x10 tiled rectangle (216x208 tile) 
  973000.0    1.11   Circulate Unmapped window (200 kids) 
 1780000.0    1.11   Moved unmapped window (25 kids) 
   20900.0    1.11   Move window (100 kids) 
   22100.0    1.11   Move window (75 kids) 
   80500.0    1.12   Char in 80-char rgb core line (Courier 12) 
  502000.0    1.12   Destroy window via parent (200 kids) 
 1650000.0    1.12   Resize unmapped window (100 kids) 
  125000.0    1.13   10-pixel ellipse 
 4780000.0    1.13   10-pixel horizontal line segment 
 1070000.0    1.13   10x10 tiled rectangle (161x145 tile) 
 1180000.0    1.13   1x1 tiled rectangle (216x208 tile) 
   23500.0    1.13   Move window (50 kids) 
 1670000.0    1.13   Resize unmapped window (50 kids) 
  547000.0    1.14   10x10 tiled rectangle (17x15 tile) 
   37600.0    1.14   Circulate window (4 kids) 
   22600.0    1.15   Char in 30-char aa core line (Charter 24) 
 2270000.0    1.15   Circulate Unmapped window (75 kids) 
   37400.0    1.15   Fill 10x10 tiled trapezoid (4x4 tile) 
   28100.0    1.15   Move window (4 kids) 
   87600.0    1.16   Char in 80-char aa core line (Courier 12) 
 1970000.0    1.16   Circulate Unmapped window (100 kids) 
 3180000.0    1.16   Circulate Unmapped window (25 kids) 
 2710000.0    1.16   Circulate Unmapped window (50 kids) 
   42800.0    1.16   Fill 10x10 tiled trapezoid (216x208 tile) 
 1980000.0    1.16   Unmap window via parent (100 kids) 
    3490.0    1.17   100x100 tiled rectangle (4x4 tile) 
  144000.0    1.17   10x10 tiled rectangle (4x4 tile) 
   10200.0    1.17   500x50 wide vertical line segment 
  711000.0    1.17   Char in 80-char rgb line (Courier 12) 
      90.5    1.17   Fill 300x300 tiled trapezoid (161x145 tile) 
  557000.0    1.17   Move window via parent (25 kids) 
  110000.0    1.17   Move window via parent (4 kids) 
  904000.0    1.17   Move window via parent (50 kids) 
     102.0    1.18   500x500 tiled rectangle (4x4 tile) 
   21900.0    1.18   Char in 30-char a core line (Charter 24) 
   87400.0    1.18   Char in 80-char a core line (Courier 12) 
    4350.0    1.18   Fill 100x100 aa trapezoid 
   42200.0    1.18   Fill 10x10 tiled trapezoid (161x145 tile) 
   39500.0    1.18   Fill 10x10 tiled trapezoid (17x15 tile) 
  388000.0    1.18   Move window via parent (16 kids) 
   23400.0    1.18   Resize window (200 kids) 
 1910000.0    1.18   Unmap window via parent (75 kids) 
   33800.0    1.19   100-pixel circle 
     807.0    1.19   Fill 100x100 tiled trapezoid (216x208 tile) 
   15100.0    1.19   Fill 100x100 trapezoid 
   76000.0    1.19   Map window via parent (4 kids) 
    4140.0    1.20   100-pixel wide double-dashed circle 
    2200.0    1.20   500-pixel ellipse 
 3370000.0    1.20   Circulate Unmapped window (16 kids) 
 1690000.0    1.21   Char in 20/40/20 line (6x13, TR 10) 
 8160000.0    1.21   X protocol NoOperation 
    1750.0    1.22   100-pixel wide dashed ellipse 
  191000.0    1.22   Char in 30-char rgb line (Charter 24) 
   32400.0    1.22   Resize window (75 kids) 
   10200.0    1.23   100-pixel ellipse 
   15300.0    1.23   100-pixel partial ellipse 
    4630.0    1.23   500x50 wide line 
   30100.0    1.23   Resize window (100 kids) 
      92.0    1.24   Fill 300x300 tiled trapezoid (216x208 tile) 
     777.0    1.25   GetImage 100x100 square 
   96800.0    1.25   Hide/expose window via popup (16 kids) 
  714000.0    1.26   Create unmapped window (200 kids) 
   34900.0    1.26   Resize window (50 kids) 
   40100.0    1.27   Resize window (16 kids) 
 1290000.0    1.27   Unmap window via parent (25 kids) 
   10000.0    1.29   500x50 wide horizontal line segment 
   38100.0    1.29   Resize window (25 kids) 
 4030000.0    1.30   100-pixel horizontal line segment 
 3530000.0    1.30   Circulate Unmapped window (4 kids) 
   25300.0    1.30   Circulate window (16 kids) 
  981000.0    1.30   Unmap window via parent (16 kids) 
   22200.0    1.31   Circulate window (100 kids) 
   19100.0    1.31   Circulate window (200 kids) 
   43700.0    1.31   Resize window (4 kids) 
  972000.0    1.32   10x10 wide rectangle outline 
   23200.0    1.32   Circulate window (50 kids) 
   22600.0    1.32   Circulate window (75 kids) 
   13500.0    1.32   Fill 100x100 64-gon (Convex) 
   24200.0    1.33   Circulate window (25 kids) 
  365000.0    1.33   Destroy window via parent (16 kids) 
   12200.0    1.34   Fill 100x100 equivalent triangle 
  775000.0    1.35   Char in 80-char aa line (Courier 12) 
  777000.0    1.35   Char in 80-char a line (Courier 12) 
   26100.0    1.36   10-pixel wide partial ellipse 
 1840000.0    1.36   10x10 rectangle 
  711000.0    1.36   Create unmapped window (100 kids) 
  698000.0    1.37   Create unmapped window (50 kids) 
   35300.0    1.38   100-pixel partial circle 
  112000.0    1.38   Map window via parent (16 kids) 
  689000.0    1.39   Create unmapped window (25 kids) 
  500000.0    1.39   Destroy window via parent (75 kids) 
   12700.0    1.40   Fill 100x100 64-gon (Complex) 
   98700.0    1.40   Hide/expose window via popup (25 kids) 
  109000.0    1.40   Hide/expose window via popup (50 kids) 
  114000.0    1.41   Hide/expose window via popup (75 kids) 
   15600.0    1.43   100-pixel solid circle 
  112000.0    1.44   Create and map subwindows (200 kids) 
  113000.0    1.44   Create and map subwindows (50 kids) 
  114000.0    1.44   Create and map subwindows (75 kids) 
  671000.0    1.44   Create unmapped window (75 kids) 
  123000.0    1.45   10-pixel partial circle 
  112000.0    1.45   Create and map subwindows (25 kids) 
  182000.0    1.45   Destroy window via parent (4 kids) 
  113000.0    1.45   Hide/expose window via popup (100 kids) 
  118000.0    1.45   Hide/expose window via popup (200 kids) 
  322000.0    1.45   Unmap window via parent (4 kids) 
  114000.0    1.46   Create and map subwindows (100 kids) 
  109000.0    1.46   Create and map subwindows (16 kids) 
  476000.0    1.46   Destroy window via parent (50 kids) 
  655000.0    1.47   Create unmapped window (16 kids) 
  763000.0    1.48   Char in 80-char rgb line (Charter 10) 
  127000.0    1.48   Map window via parent (75 kids) 
  140000.0    1.51   Change graphics context 
   54600.0    1.52   10x1 wide vertical line segment 
  128000.0    1.52   Map window via parent (100 kids) 
   81500.0    1.53   Copy 10x10 from window to window 
  481000.0    1.53   Destroy window via parent (100 kids) 
   81600.0    1.53   Scroll 10x10 pixels 
   86200.0    1.54   Create and map subwindows (4 kids) 
   12500.0    1.55   100-pixel wide ellipse 
   80800.0    1.55   Composite 10x10 from window to window 
  121000.0    1.55   Map window via parent (50 kids) 
  394000.0    1.56   Destroy window via parent (25 kids) 
  138000.0    1.56   Fill 1x1 tiled trapezoid (17x15 tile) 
  111000.0    1.56   Map window via parent (25 kids) 
  137000.0    1.57   Fill 1x1 tiled trapezoid (4x4 tile) 
  130000.0    1.57   Map window via parent (200 kids) 
  551000.0    1.58   Create unmapped window (4 kids) 
  136000.0    1.58   Fill 1x1 tiled trapezoid (161x145 tile) 
    9850.0    1.60   Fill 100x100 equivalent complex polygons 
   53400.0    1.61   10x1 wide horizontal line segment 
  132000.0    1.63   Fill 1x1 tiled trapezoid (216x208 tile) 
   23500.0    1.69   10-pixel wide partial circle 
  105000.0    1.71   10-pixel circle 
 1470000.0    1.72   1x1 stippled rectangle (8x8 stipple) 
 1420000.0    1.73   1x1 opaque stippled rectangle (161x145 stipple) 
   53400.0    1.76   100x100 rectangle outline 
 1430000.0    1.77   1x1 stippled rectangle (161x145 stipple) 
 1430000.0    1.77   1x1 stippled rectangle (17x15 stipple) 
 1420000.0    1.78   1x1 opaque stippled rectangle (17x15 stipple) 
  773000.0    1.80   Char in 80-char a line (Charter 10) 
  768000.0    1.81   Char in 80-char aa line (Charter 10) 
 1400000.0    1.82   1x1 opaque stippled rectangle (8x8 stipple) 
     185.0    1.83   500x500 opaque stippled rectangle (17x15 stipple) 
   14000.0    1.86   Fill 10x10 aa trapezoid 
  174000.0    1.90   Fill 1x1 stippled trapezoid (17x15 stipple) 
  173000.0    1.92   Fill 1x1 opaque stippled trapezoid (8x8 stipple) 
  173000.0    1.93   Fill 1x1 opaque stippled trapezoid (161x145 stipple) 
  173000.0    1.94   Fill 1x1 opaque stippled trapezoid (17x15 stipple) 
    4140.0    1.95   100x100 opaque stippled rectangle (17x15 stipple) 
  134000.0    1.96   Fill 10x10 aa trap with 1 bit alpha 
  172000.0    1.96   Fill 1x1 stippled trapezoid (8x8 stipple) 
 1840000.0    1.97   10-pixel vertical line segment 
  173000.0    1.97   Fill 1x1 stippled trapezoid (161x145 stipple) 
    5780.0    1.99   100-pixel wide partial ellipse 
 1830000.0    1.99   1x1 rectangle 
   86400.0    1.99   Fill 10x10 stippled trapezoid (161x145 stipple) 
   86000.0    2.01   Fill 10x10 opaque stippled trapezoid (161x145 stipple) 
   14700.0    2.03   Fill 1x1 aa trapezoid 
 1840000.0    2.04   Dot 
   67700.0    2.04   Fill 10x10 stippled trapezoid (8x8 stipple) 
   77700.0    2.05   Fill 10x10 opaque stippled trapezoid (17x15 stipple) 
   74600.0    2.05   Fill 10x10 stippled trapezoid (17x15 stipple) 
    5250.0    2.11   Fill 100x100 stippled trapezoid (161x145 stipple) 
      28.1    2.11   GetImage 500x500 square 
    5760.0    2.12   Fill 100x100 opaque stippled trapezoid (161x145 stipple) 
     570.0    2.12   Fill 300x300 opaque stippled trapezoid (17x15 stipple) 
   69000.0    2.13   Fill 10x10 opaque stippled trapezoid (8x8 stipple) 
   18500.0    2.16   100-pixel filled ellipse 
    4040.0    2.18   Fill 100x100 opaque stippled trapezoid (17x15 stipple) 
   17600.0    2.19   100-pixel fill chord partial circle 
     708.0    2.19   Fill 300x300 stippled trapezoid (161x145 stipple) 
    2980.0    2.20   Fill 100x100 stippled trapezoid (17x15 stipple) 
     384.0    2.21   Fill 300x300 stippled trapezoid (17x15 stipple) 
   53800.0    2.23   10x10 rectangle outline 
     301.0    2.24   500x500 stippled rectangle (161x145 stipple) 
     869.0    2.24   Fill 300x300 opaque stippled trapezoid (161x145 stipple) 
   16300.0    2.30   100-pixel fill slice partial circle 
    7100.0    2.31   100x100 opaque stippled rectangle (161x145 stipple) 
     114.0    2.31   500x500 opaque stippled rectangle (8x8 stipple) 
    6560.0    2.32   100x100 stippled rectangle (161x145 stipple) 
     345.0    2.33   500x500 opaque stippled rectangle (161x145 stipple) 
  106000.0    2.33   Copy 10x10 1-bit deep plane 
    2260.0    2.33   Fill 100x100 stippled trapezoid (8x8 stipple) 
     404.0    2.35   Copy 500x500 1-bit deep plane 
    2570.0    2.44   Fill 100x100 opaque stippled trapezoid (8x8 stipple) 
    1370.0    2.45   100-pixel wide dashed circle 
     328.0    2.47   Fill 300x300 opaque stippled trapezoid (8x8 stipple) 
     280.0    2.51   Fill 300x300 stippled trapezoid (8x8 stipple) 
   53800.0    2.57   10-pixel fill chord partial ellipse 
   33300.0    2.57   10x1 wide line 
    7010.0    2.61   Copy 100x100 1-bit deep plane 
    2540.0    2.63   100x100 opaque stippled rectangle (8x8 stipple) 
     172.0    2.63   500x500 stippled rectangle (17x15 stipple) 
    6520.0    2.64   100-pixel wide partial circle 
   48400.0    2.64   10-pixel fill slice partial ellipse 
    3910.0    2.79   100x100 stippled rectangle (17x15 stipple) 
  224000.0    2.89   10x10 opaque stippled rectangle (161x145 stipple) 
   30900.0    2.92   10-pixel wide ellipse 
    9940.0    2.98   100-pixel wide circle 
     113.0    3.03   500x500 stippled rectangle (8x8 stipple) 
   14200.0    3.06   100x10 wide vertical line segment 
   25600.0    3.11   Fill 10x10 64-gon (Complex) 
   14200.0    3.15   100x10 wide horizontal line segment 
   41600.0    3.22   10-pixel filled ellipse 
    2560.0    3.25   100x100 stippled rectangle (8x8 stipple) 
  176000.0    3.30   10x10 opaque stippled rectangle (17x15 stipple) 
   35100.0    3.36   10-pixel fill slice partial circle 
   38100.0    3.39   10-pixel fill chord partial circle 
   25400.0    3.44   Fill 10x10 64-gon (Convex) 
  143000.0    3.47   10x10 opaque stippled rectangle (8x8 stipple) 
   22100.0    3.49   100-pixel fill chord partial ellipse 
  231000.0    3.56   10x10 stippled rectangle (161x145 stipple) 
   20100.0    3.68   100-pixel fill slice partial ellipse 
    3550.0    3.80   100x10 wide double-dashed line 
  156000.0    3.83   10x10 stippled rectangle (17x15 stipple) 
   26100.0    3.87   Fill 10x10 equivalent complex polygon 
   10700.0    3.93   100x10 wide line 
    3040.0    4.01   100x10 wide dashed line 
   27200.0    4.15   Fill 10x10 equivalent triangle 
   28200.0    4.22   Fill 10x10 trapezoid 
   27100.0    4.24   10-pixel wide circle 
   29500.0    4.44   10-pixel solid circle 
  114000.0    4.49   10x10 stippled rectangle (8x8 stipple) 
   29300.0    4.78   Fill 100x100 aa trap with 1 bit alpha 
    5270.0   11.33   Fill 300x300 aa trap with 1 bit alpha 
  

[-- Attachment #3: perflog-ddellipse100 --]
[-- Type: text/plain, Size: 10526 bytes --]

# Events: 19K cycles
#
# Overhead          Command                    Shared Object                                                                                                                                                                                                                                                         Symbol
# ........  ...............  ...............................  .............................................................................................................................................................................................................................................................
#
    32.09%             Xorg  libpixman-1.so.0.23.1            [.] pixman_op
                       |
                       --- pixman_op
                          |          
                          |--99.80%-- pixman_region_union
                          |          |          
                          |          |--99.95%-- damageRegionAppend
                          |          |          damageDamageBox
                          |          |          damagePolyRectangle
                          |          |          ProcPolyRectangle
                          |          |          Dispatch
                          |          |          main
                          |          |          __libc_start_main
                          |           --0.05%-- [...]
                           --0.20%-- [...]

     5.98%             Xorg  libc-2.11.3.so                   [.] __GI_memmove
                       |
                       --- __GI_memmove
                          |          
                          |--93.46%-- pixman_region_union
                          |          damageRegionAppend
                          |          damageDamageBox
                          |          damagePolyRectangle
                          |          ProcPolyRectangle
                          |          Dispatch
                          |          main
                          |          __libc_start_main
                          |          
                          |--5.14%-- Dispatch
                          |          main
                          |          __libc_start_main
                          |          
                          |--1.22%-- WriteEventsToClient
                          |          DamageExtNotify
                          |          .L312
                          |          damageRegionProcessPending
                          |          damagePolyRectangle
                          |          ProcPolyRectangle
                          |          Dispatch
                          |          main
                          |          __libc_start_main
                           --0.18%-- [...]

     3.25%             Xorg  [kernel.kallsyms]                [k] __lock_acquire
                       |
                       --- __lock_acquire
                          |          
                          |--98.72%-- lock_acquire
                          |          |          
                          |          |--48.51%-- _raw_spin_lock_irqsave
                          |          |          |          
                          |          |          |--45.74%-- add_wait_queue
                          |          |          |          __pollwait
                          |          |          |          |          
                          |          |          |          |--89.24%-- unix_poll
                          |          |          |          |          sock_poll
                          |          |          |          |          do_select
                          |          |          |          |          core_sys_select
                          |          |          |          |          sys_select
                          |          |          |          |          sysenter_do_call
                          |          |          |          |          0xb76ed424
                          |          |          |          |          Dispatch
                          |          |          |          |          main
                          |          |          |          |          __libc_start_main
                          |          |          |          |          
                          |          |          |          |--4.40%-- n_tty_poll
                          |          |          |          |          tty_poll
                          |          |          |          |          do_select
                          |          |          |          |          core_sys_select
                          |          |          |          |          sys_select
                          |          |          |          |          sysenter_do_call
                          |          |          |          |          0xb76ed424
                          |          |          |          |          Dispatch
                          |          |          |          |          main
                          |          |          |          |          __libc_start_main
                          |          |          |          |          
                          |          |          |          |--3.56%-- datagram_poll
                          |          |          |          |          sock_poll
                          |          |          |          |          do_select
                          |          |          |          |          core_sys_select
                          |          |          |          |          sys_select
                          |          |          |          |          sysenter_do_call
                          |          |          |          |          0xb76ed424
                          |          |          |          |          Dispatch
                          |          |          |          |          main
                          |          |          |          |          __libc_start_main
                          |          |          |          |          
                          |          |          |           --2.81%-- drm_poll
                          |          |          |                     do_select
                          |          |          |                     core_sys_select
                          |          |          |                     sys_select
                          |          |          |                     sysenter_do_call
                          |          |          |                     0xb76ed424
                          |          |          |                     Dispatch
                          |          |          |                     main
                          |          |          |                     __libc_start_main
                          |          |          |          
                          |          |          |--31.44%-- remove_wait_queue
                          |          |          |          poll_freewait
                          |          |          |          do_select
                          |          |          |          core_sys_select
                          |          |          |          sys_select
                          |          |          |          sysenter_do_call
                          |          |          |          0xb76ed424
                          |          |          |          Dispatch
                          |          |          |          main
                          |          |          |          __libc_start_main
                          |          |          |          
                          |          |          |--6.96%-- skb_dequeue
                          |          |          |          unix_stream_recvmsg
                          |          |          |          sock_aio_read
                          |          |          |          do_sync_read
                          |          |          |          vfs_read
                          |          |          |          sys_read
                          |          |          |          sysenter_do_call
                          |          |          |          0xb76ed424
                          |          |          |          _XSERVTransRead
                          |          |          |          ReadRequestFromClient
                          |          |          |          Dispatch
                          |          |          |          main
                          |          |          |          __libc_start_main
                          |          |          |          
                          |          |          |--6.55%-- __wake_up_sync_key
                          |          |          |          |          
                          |          |          |          |--79.99%-- unix_write_space
                          |          |          |          |          sock_wfree
                          |          |          |          |          unix_destruct_scm
                          |          |          |          |          skb_release_head_state
                          |          |          |          |          __kfree_skb
                          |          |          |          |          consume_skb
                          |          |          |          |          unix_stream_recvmsg
                          |          |          |          |          sock_aio_read
                          |          |          |          |          do_sync_read
                          |          |          |          |          vfs_read
                          |          |          |          |          sys_read
                          |          |          |          |          sysenter_do_call
                          |          |          |          |          0xb76ed424
                          |          |          |          |          _XSERVTransRead
                          |          |          |          |          ReadRequestFromClient
                          |          |          |          |          Dispatch
                          |          |          |          |          main
                          |          |          |          |          __libc_start_main
                          |          |          |          |          
                          |          |          |           --20.01%-- sock_def_readable

[-- Attachment #4: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-05-13  9:24 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-08 18:22 X11 performance regressions Knut Petersen
2011-05-09 16:53 ` Adam Jackson
2011-05-09 21:43 ` Chris Wilson
2011-05-11 14:46   ` Knut Petersen
2011-05-11 17:52     ` Chris Wilson
2011-05-12  7:19       ` Knut Petersen
2011-05-12  7:38         ` Chris Wilson
2011-05-12  8:24           ` Knut Petersen
2011-05-12  8:55             ` Chris Wilson
2011-05-12  9:34               ` Knut Petersen
2011-05-13  9:24               ` Knut Petersen
2011-05-11 19:49     ` Adam Jackson
2011-05-11 21:22       ` Knut Petersen
2011-05-12 13:42         ` Adam Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.