* X11 performance regressions @ 2011-05-08 18:22 Knut Petersen 2011-05-09 16:53 ` Adam Jackson 2011-05-09 21:43 ` Chris Wilson 0 siblings, 2 replies; 14+ messages in thread From: Knut Petersen @ 2011-05-08 18:22 UTC (permalink / raw) To: intel-gfx I compared the performance of X11 on two otherwise idle machines. Hardware ======== Both have identical mainboards (Aopen i915GMm-hfs), identical memory and BIOS setup. Both cpus are Intel Pentium M mobile (Dothan). One runs at 1.86 Mhz, the other runs at 2.00 MHz Software ======= 1.86 MHz system: opensuse 11.2 X.Org X Server 1.6.5 Release Date: 2009-10-11 kernel 2.6.38.5 2.00 MHz system: opensuse 11.4 X.Org X Server 1.10.99 git-tree, 2011-may-7 kernel 2.6.39-rc4-drm-intel-staging x11perf results =========== The first line always gives the test result of the 2.00 Mhz system with the current Xorg, the second line gives the test result of the 1.86 MHz sytem with Xorg 1.6.5. Read a few representative examples: 10000000 trep @ 0.0032 msec (309000.0/sec): Dot 40000000 trep @ 0.0006 msec (1650000.0/sec): Dot 45000 trep @ 0.5973 msec ( 1670.0/sec): 500x500 rectangle 100000 trep @ 0.4282 msec ( 2340.0/sec): 500x500 rectangle 2000000 reps @ 0.0034 msec (296000.0/sec): 1x1 stippled rectangle (8x8 stipple) 8000000 reps @ 0.0007 msec (1420000.0/sec): 1x1 stippled rectangle (8x8 stipple) 1500 trep @ 22.4602 msec ( 44.5/sec): 500x500 stippled rectangle (8x8 stipple) 3000 trep @ 9.2680 msec ( 108.0/sec): 500x500 stippled rectangle (8x8 stipple) 100000 trep @ 0.4043 msec ( 2470.0/sec): Fill 10x10 trapezoid 1000000 trep @ 0.0336 msec ( 29700.0/sec): Fill 10x10 trapezoid The old X on the PC with the slower cpu is always significantly faster than the current git code, very often more than 5 times as fast, and a number of test show 1.6.5 to be more than 12 times faster than 1.10.99. I did not use any special configuration options at compile time 1.10.99 was built using the following commands. export PREFIX=/home/knut/local export PKG_CONFIG_PATH=$PREFIX/lib/pkgconfig export PATH=$PREFIX/bin:$PATH export ACLOCAL="aclocal -I $PREFIX/share/aclocal" export LD_LIBRARY_PATH=$PREFIX/lib export PYTHONPATH=$PREFIX/lib/python2.7/site-packages util/modular/build.sh -g $PREFIX Could anybody please explain why the old server is so much faster? Are there any compile time or runtime options that could/should be used? cu, Knut ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-08 18:22 X11 performance regressions Knut Petersen @ 2011-05-09 16:53 ` Adam Jackson 2011-05-09 21:43 ` Chris Wilson 1 sibling, 0 replies; 14+ messages in thread From: Adam Jackson @ 2011-05-09 16:53 UTC (permalink / raw) To: Knut Petersen; +Cc: intel-gfx On 5/8/11 2:22 PM, Knut Petersen wrote: > Software > ======= > 1.86 MHz system: > opensuse 11.2 > X.Org X Server 1.6.5 > Release Date: 2009-10-11 > kernel 2.6.38.5 > > 2.00 MHz system: > opensuse 11.4 > X.Org X Server 1.10.99 > git-tree, 2011-may-7 > kernel 2.6.39-rc4-drm-intel-staging I'd start by suspecting differences in .config for the kernel between the two, particularly since... > 10000000 trep @ 0.0032 msec (309000.0/sec): Dot > 40000000 trep @ 0.0006 msec (1650000.0/sec): Dot Dot dispatch is so completely CPU-dominated that I suspect you're simply measuring CPU overhead somewhere else. For example, if one of those kernels is built with spinlock debugging and the other isn't. - ajax ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-08 18:22 X11 performance regressions Knut Petersen 2011-05-09 16:53 ` Adam Jackson @ 2011-05-09 21:43 ` Chris Wilson 2011-05-11 14:46 ` Knut Petersen 1 sibling, 1 reply; 14+ messages in thread From: Chris Wilson @ 2011-05-09 21:43 UTC (permalink / raw) To: Knut Petersen, intel-gfx As a point of comparison, here are the similar results with master of all the various trees on my 1.6GHz N450 (Atom+PineView) [so not strictly an apples-to-apples comparison, your CPU is about 4-5x faster, but PNV is about 3-4x faster than the 915GM (clock-for-clock)]: On Sun, 08 May 2011 20:22:21 +0200, Knut Petersen <Knut_Petersen@t-online.de> wrote: > 10000000 trep @ 0.0032 msec (309000.0/sec): Dot > 40000000 trep @ 0.0006 msec (1650000.0/sec): Dot 50000000 trep @ 0.0005 msec (1830000.0/sec): Dot *100000000 trep @ 0.0003 msec (2900000.0/sec): Dot > 45000 trep @ 0.5973 msec ( 1670.0/sec): 500x500 rectangle > 100000 trep @ 0.4282 msec ( 2340.0/sec): 500x500 rectangle 100000 trep @ 0.3210 msec ( 3120.0/sec): 500x500 rectangle > 2000000 reps @ 0.0034 msec (296000.0/sec): 1x1 stippled rectangle (8x8 stipple) > 8000000 reps @ 0.0007 msec (1420000.0/sec): 1x1 stippled rectangle (8x8 stipple) 25000000 trep @ 0.0011 msec (902000.0/sec): 1x1 stippled rectangle (8x8 stipple) *30000000 trep @ 0.0008 msec (1180000.0/sec): 1x1 stippled rectangle (8x8 stipple) > 1500 trep @ 22.4602 msec ( 44.5/sec): 500x500 stippled rectangle (8x8 stipple) > 3000 trep @ 9.2680 msec ( 108.0/sec): 500x500 stippled rectangle (8x8 stipple) 4000 trep @ 6.8986 msec ( 145.0/sec): 500x500 stippled rectangle (8x8 stipple) *3500 trep @ 7.0786 msec ( 141.0/sec): 500x500 stippled rectangle (8x8 stipple) > 100000 trep @ 0.4043 msec ( 2470.0/sec): Fill 10x10 trapezoid > 1000000 trep @ 0.0336 msec ( 29700.0/sec): Fill 10x10 trapezoid 2000000 trep @ 0.0152 msec ( 65700.0/sec): Fill 10x10 trapezoid *4000000 trep @ 0.0064 msec (156000.0/sec): Fill 10x10 trapezoid Hmm. My suspicion was that this was GEM-related regressions (the overhead of dynamic buffer manager and relocations) along with various optimizations for the common cases affecting the software fallback dominated benchmarks selected above. And whilst there may some element of that behind the regression you're observing, I don't think that is the whole story and Adam is right to suggest to check that the systems are indeed configured identically (wrt to debug and optimisation options). -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-09 21:43 ` Chris Wilson @ 2011-05-11 14:46 ` Knut Petersen 2011-05-11 17:52 ` Chris Wilson 2011-05-11 19:49 ` Adam Jackson 0 siblings, 2 replies; 14+ messages in thread From: Knut Petersen @ 2011-05-11 14:46 UTC (permalink / raw) To: intel-gfx Yes, I made some mistakes during my first measurements. Below find better results. They are made on the same machine, with the same kernel, at the same speed, with the same x11perf program, absolutely nothing changed. I used x11perfcomp -ro and sorted the output, worst results for the currrent git code first. I think the numbers below are quite interesting ... -Knut System ====== AOpen i915GMm-hfs Pentium M 2.00 MHz (Dothan) running @ 2MHz fixed frequency, no thermal throttling 2GB RAM 1: Xorg of openSuSE 11.2 (absolute numbers) =========================================== X.Org X Server 1.6.5 Release Date: 2009-10-11 X Protocol Version 11, Revision 0 Build Operating System: openSUSE SUSE LINUX Current Operating System: Linux linux-iffr 2.6.38.5-kape #10 PREEMPT Fri May 6 17:41:06 CEST 2011 i686 Build Date: 23 September 2010 03:43:55PM Binaries, as distributed by openSuSE 2: Xorg, fresh from git 10 May 2011 (relative performance) ========================================================== X.Org X Server 1.10.99.1 Release Date: unreleased X Protocol Version 11, Revision 0 Build Operating System: Linux 2.6.39-rc4-drm-intel-staging+ i686 Current Operating System: Linux linux-iffr 2.6.38.5-kape #10 PREEMPT Fri May 6 17:41:06 CEST 2011 i686 Kernel command line: root=/dev/hda2 acpi_enforce_resources=lax drm.debug=0x0 Build Date: 10 May 2011 04:43:21PM Compiled without any special options using build.sh 1 2 Operation -------- ------ --------- 965000.0 0.016 10x10 wide rectangle outline 164000.0 0.033 Fill 1x1 equivalent triangle 152000.0 0.034 Fill 1x1 trapezoid 175000.0 0.061 Fill 1x1 stippled trapezoid (161x145 stipple) 174000.0 0.062 Fill 1x1 opaque stippled trapezoid (161x145 stipple) 173000.0 0.062 Fill 1x1 opaque stippled trapezoid (17x15 stipple) 173000.0 0.062 Fill 1x1 opaque stippled trapezoid (8x8 stipple) 173000.0 0.062 Fill 1x1 stippled trapezoid (17x15 stipple) 173000.0 0.062 Fill 1x1 stippled trapezoid (8x8 stipple) 138000.0 0.073 Fill 1x1 tiled trapezoid (17x15 tile) 136000.0 0.074 Fill 1x1 tiled trapezoid (161x145 tile) 136000.0 0.074 Fill 1x1 tiled trapezoid (216x208 tile) 137000.0 0.074 Fill 1x1 tiled trapezoid (4x4 tile) 2670.0 0.088 100-pixel double-dashed ellipse 4170.0 0.092 100-pixel dashed ellipse 85300.0 0.11 Fill 10x10 opaque stippled trapezoid (161x145 stipple) 85800.0 0.11 Fill 10x10 stippled trapezoid (161x145 stipple) 76400.0 0.12 Fill 10x10 opaque stippled trapezoid (17x15 stipple) 74800.0 0.12 Fill 10x10 stippled trapezoid (17x15 stipple) 68800.0 0.13 Fill 10x10 opaque stippled trapezoid (8x8 stipple) 67200.0 0.13 Fill 10x10 stippled trapezoid (8x8 stipple) 34800000.0 0.14 1-pixel solid circle 42300.0 0.15 Fill 10x10 tiled trapezoid (161x145 tile) 41900.0 0.15 Fill 10x10 tiled trapezoid (216x208 tile) 4080.0 0.16 100-pixel wide double-dashed ellipse 26800.0 0.16 500x500 rectangle outline 38100.0 0.16 Fill 10x10 tiled trapezoid (17x15 tile) 36700.0 0.16 Fill 10x10 tiled trapezoid (4x4 tile) 24700000.0 0.17 1-pixel line 22200000.0 0.17 1-pixel line segment 27500.0 0.18 Fill 10x10 equivalent triangle 28300.0 0.18 Fill 10x10 trapezoid 190000.0 0.20 100x100 wide rectangle outline 5910.0 0.23 Fill 300x300 trapezoid 553000.0 0.24 Copy 10x10 from pixmap to pixmap 292000.0 0.25 100-pixel line segment (3 kids) 54600.0 0.25 10x10 rectangle outline 281000.0 0.26 100-pixel line segment (2 kids) 4670000.0 0.26 10-pixel horizontal line segment 114000.0 0.27 Fill 1x1 aa trap 198000.0 0.27 ShmPutImage 10x10 square 265000.0 0.28 100-pixel line segment (1 kid) 2980000.0 0.28 10-pixel dashed line 2220000.0 0.28 10-pixel dashed segment 2840000.0 0.28 10-pixel line 2010000.0 0.28 10-pixel line segment 21400.0 0.28 500-pixel circle 763.0 0.28 Fill 100x100 tiled trapezoid (161x145 tile) 632.0 0.28 Fill 100x100 tiled trapezoid (17x15 tile) 572.0 0.28 Fill 100x100 tiled trapezoid (4x4 tile) 15300.0 0.28 Fill 100x100 trapezoid 3960000.0 0.29 100-pixel horizontal line segment 299000.0 0.30 100-pixel dashed line 273000.0 0.30 100-pixel dashed segment 247000.0 0.30 100-pixel double-dashed segment 274000.0 0.30 100-pixel line 248000.0 0.30 100-pixel line segment 820000.0 0.30 1-pixel circle 5410.0 0.30 500-pixel filled ellipse 2840.0 0.30 500-pixel solid circle 272000.0 0.31 100-pixel double-dashed line 130000.0 0.31 10-pixel partial ellipse 154000.0 0.31 PutImage 10x10 square 1090000.0 0.32 10x10 tiled rectangle (161x145 tile) 1120000.0 0.32 10x10 tiled rectangle (216x208 tile) 12400.0 0.32 Fill 100x100 equivalent triangle 1220000.0 0.33 1x1 tiled rectangle (161x145 tile) 1220000.0 0.33 1x1 tiled rectangle (17x15 tile) 1220000.0 0.33 1x1 tiled rectangle (216x208 tile) 1220000.0 0.33 1x1 tiled rectangle (4x4 tile) 3540.0 0.33 500-pixel wide ellipse 792.0 0.33 Fill 100x100 tiled trapezoid (216x208 tile) 87200.0 0.33 Fill 2x1 aa trap 552000.0 0.34 10x10 tiled rectangle (17x15 tile) 263000.0 0.34 Fill 1x1 aa trap with 1 bit alpha 88.4 0.34 Fill 300x300 tiled trapezoid (161x145 tile) 125000.0 0.36 10-pixel ellipse 71.5 0.38 Fill 300x300 tiled trapezoid (17x15 tile) 1680000.0 0.39 100-pixel vertical line segment 54200.0 0.39 100x100 rectangle outline 65.0 0.39 Fill 300x300 tiled trapezoid (4x4 tile) 33900.0 0.40 100-pixel circle 147000.0 0.40 10x10 tiled rectangle (4x4 tile) 103.0 0.41 500x500 tiled rectangle (4x4 tile) 35300.0 0.42 100-pixel partial circle 1780.0 0.42 100-pixel wide dashed ellipse 3520.0 0.42 100x100 tiled rectangle (4x4 tile) 56200.0 0.42 500-pixel line 5200.0 0.42 500x500 wide rectangle outline 11300.0 0.42 GetImage 10x10 square 90.5 0.44 Fill 300x300 tiled trapezoid (216x208 tile) 12900.0 0.45 100-pixel wide ellipse 50800.0 0.45 500-pixel line segment 1820000.0 0.46 10x10 rectangle 1450000.0 0.46 1x1 opaque stippled rectangle (8x8 stipple) 1570.0 0.46 ShmPutImage 500x500 square 23800.0 0.47 100x100 tiled rectangle (17x15 tile) 5730.0 0.47 Fill 100x100 opaque stippled trapezoid (161x145 stipple) 5210.0 0.47 Fill 100x100 stippled trapezoid (161x145 stipple) 122000.0 0.48 10-pixel partial circle 78600.0 0.49 100x100 rectangle 1860000.0 0.49 10-pixel vertical line segment 54300.0 0.49 10x1 wide horizontal line segment 54400.0 0.49 10x1 wide vertical line segment 1420000.0 0.50 1x1 opaque stippled rectangle (17x15 stipple) 1440000.0 0.50 1x1 stippled rectangle (17x15 stipple) 1450000.0 0.50 1x1 stippled rectangle (8x8 stipple) 1420000.0 0.51 1x1 stippled rectangle (161x145 stipple) 691.0 0.51 500x500 tiled rectangle (17x15 tile) 3330.0 0.52 100-pixel dashed circle 1400000.0 0.52 1x1 opaque stippled rectangle (161x145 stipple) 1830000.0 0.52 1x1 rectangle 2330000.0 0.52 500-pixel horizontal line segment 4020.0 0.52 Fill 100x100 opaque stippled trapezoid (17x15 stipple) 2190.0 0.53 100-pixel double-dashed circle 2300000.0 0.53 500-pixel vertical line segment 2540.0 0.53 500-pixel wide circle 1810000.0 0.53 Dot 15300.0 0.54 100-pixel partial ellipse 26100.0 0.54 10-pixel wide partial ellipse 182.0 0.54 500x500 opaque stippled rectangle (17x15 stipple) 3060.0 0.54 Fill 100x100 stippled trapezoid (17x15 stipple) 15400.0 0.54 GetProperty 15500.0 0.54 QueryPointer 4150.0 0.56 100-pixel wide double-dashed circle 105000.0 0.56 10-pixel circle 10200.0 0.60 100-pixel ellipse 10200.0 0.60 500x50 wide vertical line segment 705.0 0.60 Fill 300x300 stippled trapezoid (161x145 stipple) 1480000.0 0.60 Unmap window via parent (50 kids) 10300.0 0.61 500x50 wide horizontal line segment 2530.0 0.61 Fill 100x100 opaque stippled trapezoid (8x8 stipple) 848.0 0.61 Fill 300x300 opaque stippled trapezoid (161x145 stipple) 21700.0 0.61 ShmPutImage 100x100 square 2240.0 0.62 Fill 100x100 stippled trapezoid (8x8 stipple) 386.0 0.62 Fill 300x300 stippled trapezoid (17x15 stipple) 551.0 0.63 Fill 300x300 opaque stippled trapezoid (17x15 stipple) 130000.0 0.64 Fill 10x10 aa trap with 1 bit alpha 4080.0 0.65 100x100 opaque stippled rectangle (17x15 stipple) 296.0 0.65 500x500 stippled rectangle (161x145 stipple) 2200.0 0.66 500-pixel ellipse 341.0 0.67 500x500 opaque stippled rectangle (161x145 stipple) 4610.0 0.67 500x50 wide line 15200.0 0.68 Fill 1x1 aa trap with 4 bit alpha 325.0 0.69 Fill 300x300 opaque stippled trapezoid (8x8 stipple) 1650000.0 0.70 Unmap window via parent (200 kids) 6750.0 0.71 100x100 opaque stippled rectangle (161x145 stipple) 54800.0 0.71 10-pixel fill chord partial ellipse 6290.0 0.73 100x100 stippled rectangle (161x145 stipple) 175.0 0.74 500x500 stippled rectangle (17x15 stipple) 12700.0 0.74 Fill 10x10 aa trap 275.0 0.75 Fill 300x300 stippled trapezoid (8x8 stipple) 109.0 0.76 500x500 opaque stippled rectangle (8x8 stipple) 1130000.0 0.76 Circulate Unmapped window (200 kids) 14500.0 0.78 Fill 10x10 aa trapezoid 15300.0 0.78 Fill 1x1 aa trapezoid 10200.0 0.78 Fill 2x10 aa trap 9180.0 0.78 PutImage 100x100 square 15100.0 0.79 100-pixel solid circle 48000.0 0.80 10-pixel fill slice partial ellipse 33400.0 0.80 10x1 wide line 2350.0 0.80 500x500 rectangle 0.5 0.80 PutImage XY 500x500 square 0.5 0.80 ShmPutImage XY 500x500 square 18400.0 0.81 100-pixel filled ellipse 2590.0 0.82 100x100 opaque stippled rectangle (8x8 stipple) 3900.0 0.82 100x100 stippled rectangle (17x15 stipple) 6930.0 0.82 Fill 10x10 aa trap with 4 bit alpha 927.0 0.83 500x500 tiled rectangle (216x208 tile) 219000.0 0.86 10x10 opaque stippled rectangle (161x145 stipple) 140000.0 0.86 Copy 10x10 from window to pixmap 16300.0 0.87 100-pixel fill slice partial circle 28300.0 0.87 100x100 tiled rectangle (216x208 tile) 30900.0 0.87 10-pixel wide ellipse 23400.0 0.87 10-pixel wide partial circle 859.0 0.87 500x500 tiled rectangle (161x145 tile) 462.0 0.87 PutImage 500x500 square 17600.0 0.88 100-pixel fill chord partial circle 145000.0 0.88 10x10 opaque stippled rectangle (8x8 stipple) 143000.0 0.88 Copy 10x10 from pixmap to window 6530.0 0.89 100-pixel wide partial circle 28100.0 0.89 100x100 tiled rectangle (161x145 tile) 138000.0 0.89 Composite 10x10 from pixmap to window 1470.0 0.90 Fill 100x100 aa trap 1350.0 0.90 Fill 100x100 aa trap with 4 bit alpha 1930.0 0.90 GetImage XY 10x10 square 14200.0 0.92 100x10 wide vertical line segment 4460.0 0.92 Fill 100x100 aa trapezoid 41700.0 0.93 10-pixel filled ellipse 463.0 0.93 Fill 300x300 aa trap with 4 bit alpha 1350000.0 0.93 Move window via parent (200 kids) 14300.0 0.94 100x10 wide horizontal line segment 4810.0 0.94 Fill 300x300 aa pre-added trapezoid 476.0 0.94 Fill 300x300 aa trap 110.0 0.95 500x500 stippled rectangle (8x8 stipple) 16800.0 0.96 Fill 100x100 aa pre-added trapezoid 1140.0 0.96 PutImage XY 10x10 square 1570000.0 0.96 Resize unmapped window (4 kids) 22100.0 0.97 100-pixel fill chord partial ellipse 155000.0 0.97 Fill 10x10 aa pre-added trapezoid 1040.0 0.97 Fill 2x100 aa trap 1660000.0 0.97 Moved unmapped window (16 kids) 1660000.0 0.97 Moved unmapped window (25 kids) 1190000.0 0.97 Move window via parent (100 kids) 11.8 0.97 PutImage XY 100x100 square 11.4 0.97 ShmPutImage XY 100x100 square 1670000.0 0.97 Unmap window via parent (100 kids) 173000.0 0.98 10x10 opaque stippled rectangle (17x15 stipple) 926000.0 0.98 Fill 1x1 aa pre-added trapezoid 346.0 0.98 Fill 2x300 aa trap 57600.0 0.98 Hide/expose window via popup (4 kids) 1630000.0 0.98 Moved unmapped window (100 kids) 1630000.0 0.98 Moved unmapped window (200 kids) 1650000.0 0.98 Moved unmapped window (4 kids) 1640000.0 0.98 Moved unmapped window (50 kids) 1640000.0 0.98 Moved unmapped window (75 kids) 1210.0 0.98 Scroll 500x500 pixels 574.0 0.99 Copy 100x100 n-bit deep plane 867.0 0.99 Copy 500x500 from pixmap to pixmap 23.3 0.99 Copy 500x500 n-bit deep plane 24.7 0.99 GetImage XY 100x100 square 16600.0 0.99 Move window (200 kids) 1560000.0 0.99 Resize unmapped window (16 kids) 1530000.0 0.99 Resize unmapped window (200 kids) 1560000.0 0.99 Resize unmapped window (25 kids) 1550000.0 0.99 Resize unmapped window (50 kids) 1050.0 0.99 ShmPutImage XY 10x10 square 266000.0 1.00 Char in 30-char aa line (Charter 24) 265000.0 1.00 Char in 30-char a line (Charter 24) 508000.0 1.00 Char in 30-char image line (TR 24) 869.0 1.00 Composite 500x500 from window to window 870.0 1.00 Copy 500x500 from window to window 1.0 1.00 GetImage XY 500x500 square 1530000.0 1.00 Resize unmapped window (100 kids) 20000.0 1.01 100-pixel fill slice partial ellipse 231000.0 1.01 10x10 stippled rectangle (161x145 stipple) 19900.0 1.01 Composite 100x100 from pixmap to window 19600.0 1.01 Composite 100x100 from window to window 851.0 1.01 Composite 500x500 from pixmap to window 19800.0 1.01 Copy 100x100 from pixmap to pixmap 19900.0 1.01 Copy 100x100 from pixmap to window 20000.0 1.01 Copy 100x100 from window to pixmap 19600.0 1.01 Copy 100x100 from window to window 851.0 1.01 Copy 500x500 from pixmap to window 1530000.0 1.01 Resize unmapped window (75 kids) 10300.0 1.02 100-pixel wide circle 108000.0 1.02 Char in 80-char rgb core line (Charter 10) 849.0 1.02 Copy 500x500 from window to pixmap 169000.0 1.03 10x10 stippled rectangle (17x15 stipple) 20700.0 1.03 Move window (100 kids) 26900.0 1.03 Scroll 100x100 pixels 255000.0 1.04 Char16 in 23-char line (k24) 1720000.0 1.04 Char in 80-char image line (TR 10) 37200.0 1.04 Circulate window (4 kids) 25200.0 1.04 Move window (25 kids) 23200.0 1.04 Move window (50 kids) 21900.0 1.04 Move window (75 kids) 2540.0 1.05 100x100 stippled rectangle (8x8 stipple) 41700.0 1.05 Copy 10x10 n-bit deep plane 25800.0 1.05 Move window (16 kids) 119000.0 1.06 Char in 80-char a core line (Charter 10) 2010000.0 1.07 Circulate Unmapped window (100 kids) 2250000.0 1.07 Circulate Unmapped window (75 kids) 27600.0 1.07 Move window (4 kids) 377000.0 1.07 Move window via parent (16 kids) 1050000.0 1.07 Move window via parent (75 kids) 626000.0 1.08 Char16 in 40-char line (k14) 118000.0 1.08 Char in 80-char aa core line (Charter 10) 534000.0 1.08 Move window via parent (25 kids) 108000.0 1.08 Move window via parent (4 kids) 471000.0 1.09 Char16 in 40-char image line (k14) 23700.0 1.09 Resize window (200 kids) 1030000.0 1.10 Char in 60-char image line (9x15) 1400000.0 1.10 Char in 80-char image line (6x13) 2610000.0 1.10 Circulate Unmapped window (50 kids) 756.0 1.10 Fill 300x300 aa trapezoid 30000.0 1.10 Resize window (100 kids) 134000.0 1.11 10x10 stippled rectangle (8x8 stipple) 833000.0 1.11 Char in 30-char line (TR 24) 34800.0 1.11 Resize window (50 kids) 32100.0 1.11 Resize window (75 kids) 308000.0 1.11 Unmap window via parent (4 kids) 176000.0 1.12 Char16 in 23-char image line (k24) 1200000.0 1.12 Char in 70-char image line (8x13) 634000.0 1.12 Char in 80-char rgb line (Courier 12) 902000.0 1.12 Unmap window via parent (16 kids) 5790.0 1.13 100-pixel wide partial ellipse 314000.0 1.13 Char16 in 7/14/7 line (k14, k24) 1570000.0 1.13 Char in 60-char line (9x15) 39800.0 1.13 Resize window (16 kids) 21100.0 1.14 Char in 30-char rgb core line (Charter 24) 1940000.0 1.14 Char in 80-char line (6x13) 3010000.0 1.14 Circulate Unmapped window (25 kids) 37900.0 1.14 Resize window (25 kids) 80700.0 1.15 Char in 80-char rgb core line (Courier 12) 74000.0 1.15 Map window via parent (4 kids) 42800.0 1.15 Resize window (4 kids) 1140000.0 1.15 Unmap window via parent (25 kids) 1610000.0 1.15 Unmap window via parent (75 kids) 10700.0 1.16 100x10 wide line 1750000.0 1.16 Char in 70-char line (8x13) 822000.0 1.16 Move window via parent (50 kids) 38100.0 1.17 10-pixel fill chord partial circle 87400.0 1.18 Char in 80-char aa core line (Courier 12) 2160000.0 1.18 Char in 80-char line (TR 10) 19500.0 1.18 Circulate window (200 kids) 22600.0 1.19 Char in 30-char aa core line (Charter 24) 22600.0 1.19 Char in 30-char a core line (Charter 24) 172000.0 1.19 Char in 30-char rgb line (Charter 24) 185000.0 1.19 Destroy window via parent (4 kids) 95200.0 1.19 Hide/expose window via popup (16 kids) 35300.0 1.20 10-pixel fill slice partial circle 21900.0 1.20 Circulate window (100 kids) 25000.0 1.20 Circulate window (16 kids) 87500.0 1.21 Char in 80-char a core line (Courier 12) 3190000.0 1.21 Circulate Unmapped window (16 kids) 24100.0 1.21 Circulate window (25 kids) 8150000.0 1.21 X protocol NoOperation 411000.0 1.22 Destroy window via parent (25 kids) 1620000.0 1.23 Char in 20/40/20 line (6x13, TR 10) 23100.0 1.23 Circulate window (50 kids) 22500.0 1.23 Circulate window (75 kids) 3430000.0 1.25 Circulate Unmapped window (4 kids) 465000.0 1.25 Destroy window via parent (50 kids) 3550.0 1.26 100x10 wide double-dashed line 13500.0 1.27 Fill 100x100 64-gon (Convex) 759.0 1.27 GetImage 100x100 square 374000.0 1.28 Destroy window via parent (16 kids) 774000.0 1.30 Char in 80-char a line (Courier 12) 771000.0 1.32 Char in 80-char aa line (Courier 12) 513000.0 1.32 Destroy window via parent (100 kids) 108000.0 1.33 Hide/expose window via popup (50 kids) 109000.0 1.33 Map window via parent (16 kids) 761000.0 1.34 Char in 80-char rgb line (Charter 10) 97400.0 1.34 Hide/expose window via popup (25 kids) 1380.0 1.36 100-pixel wide dashed circle 494000.0 1.38 Destroy window via parent (200 kids) 490000.0 1.38 Destroy window via parent (75 kids) 115000.0 1.38 Hide/expose window via popup (100 kids) 112000.0 1.38 Hide/expose window via popup (75 kids) 3030.0 1.39 100x10 wide dashed line 12100.0 1.41 Fill 100x100 64-gon (Complex) 117000.0 1.42 Hide/expose window via popup (200 kids) 103000.0 1.43 Create and map subwindows (100 kids) 80900.0 1.43 Create and map subwindows (4 kids) 101000.0 1.47 Create and map subwindows (16 kids) 9970.0 1.48 Fill 100x100 equivalent complex polygons 120000.0 1.48 Map window via parent (50 kids) 105000.0 1.49 Create and map subwindows (75 kids) 29700.0 1.50 10-pixel solid circle 139000.0 1.50 Change graphics context 515000.0 1.50 Create unmapped window (200 kids) 509000.0 1.50 Create unmapped window (50 kids) 514000.0 1.50 Create unmapped window (75 kids) 507000.0 1.51 Create unmapped window (16 kids) 126000.0 1.51 Map window via parent (100 kids) 110000.0 1.51 Map window via parent (25 kids) 81700.0 1.52 Copy 10x10 from window to window 102000.0 1.52 Create and map subwindows (25 kids) 103000.0 1.52 Create and map subwindows (50 kids) 81000.0 1.53 Composite 10x10 from window to window 101000.0 1.53 Create and map subwindows (200 kids) 126000.0 1.53 Map window via parent (200 kids) 81300.0 1.53 Scroll 10x10 pixels 27200.0 1.54 10-pixel wide circle 122000.0 1.56 Map window via parent (75 kids) 28300.0 1.57 Fill 100x100 aa trap with 1 bit alpha 515000.0 1.59 Create unmapped window (100 kids) 488000.0 1.66 Create unmapped window (25 kids) 773000.0 1.75 Char in 80-char a line (Charter 10) 766000.0 1.76 Char in 80-char aa line (Charter 10) 29.4 1.89 GetImage 500x500 square 413000.0 1.91 Create unmapped window (4 kids) 107000.0 2.20 Copy 10x10 1-bit deep plane 392.0 2.48 Copy 500x500 1-bit deep plane 7040.0 2.59 Copy 100x100 1-bit deep plane 25500.0 3.22 Fill 10x10 64-gon (Complex) 25900.0 3.39 Fill 10x10 64-gon (Convex) 26100.0 3.87 Fill 10x10 equivalent complex polygon 5040.0 4.82 Fill 300x300 aa trap with 1 bit alpha ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-11 14:46 ` Knut Petersen @ 2011-05-11 17:52 ` Chris Wilson 2011-05-12 7:19 ` Knut Petersen 2011-05-11 19:49 ` Adam Jackson 1 sibling, 1 reply; 14+ messages in thread From: Chris Wilson @ 2011-05-11 17:52 UTC (permalink / raw) To: Knut Petersen, intel-gfx On Wed, 11 May 2011 16:46:12 +0200, Knut Petersen <Knut_Petersen@t-online.de> wrote: > Yes, I made some mistakes during my first measurements. > > Below find better results. They are made on the same machine, > with the same kernel, at the same speed, with the same x11perf > program, absolutely nothing changed. > > I used x11perfcomp -ro and sorted the output, worst results for > the currrent git code first. > > I think the numbers below are quite interesting ... > 1 2 Operation > -------- ------ --------- > 965000.0 0.016 10x10 wide rectangle outline Something is still not quite right here. This should be mostly CPU bound, and even my Atom gets 734k. Can you check that (a) it is CPU bound and (b) the worst offenders according to the system profiler of your choice (e.g. perf)? Thanks for doing this investigation. -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-11 17:52 ` Chris Wilson @ 2011-05-12 7:19 ` Knut Petersen 2011-05-12 7:38 ` Chris Wilson 0 siblings, 1 reply; 14+ messages in thread From: Knut Petersen @ 2011-05-12 7:19 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx >> 1 2 Operation >> -------- ------ --------- >> 965000.0 0.016 10x10 wide rectangle outline > Something is still not quite right here. This should be mostly CPU bound, > and even my Atom gets 734k. > > Can you check that (a) it is CPU bound and (b) the worst offenders > according to the system profiler of your choice (e.g. perf)? > 734k would be nice ;-) With current git Xorg its 10300 reps at 800 MHz and 16300 reps at 2000 MHz. Increasing cpu clock by a factor of 2.5 increases reps by a factor of 1.58. cu, knut ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-12 7:19 ` Knut Petersen @ 2011-05-12 7:38 ` Chris Wilson 2011-05-12 8:24 ` Knut Petersen 0 siblings, 1 reply; 14+ messages in thread From: Chris Wilson @ 2011-05-12 7:38 UTC (permalink / raw) To: Knut Petersen; +Cc: intel-gfx On Thu, 12 May 2011 09:19:39 +0200, Knut Petersen <Knut_Petersen@t-online.de> wrote: > > >> 1 2 Operation > >> -------- ------ --------- > >> 965000.0 0.016 10x10 wide rectangle outline > > Something is still not quite right here. This should be mostly CPU bound, > > and even my Atom gets 734k. > > > > Can you check that (a) it is CPU bound and (b) the worst offenders > > according to the system profiler of your choice (e.g. perf)? > > > > 734k would be nice ;-) > > With current git Xorg its 10300 reps at 800 MHz and 16300 reps at 2000 MHz. > Increasing cpu clock by a factor of 2.5 increases reps by a factor of 1.58. Please do something like 'perf record -f -g -a x11perf -d :0 -worect10; perf report | head -150' and paste the output. -Chris > > cu, > knut -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-12 7:38 ` Chris Wilson @ 2011-05-12 8:24 ` Knut Petersen 2011-05-12 8:55 ` Chris Wilson 0 siblings, 1 reply; 14+ messages in thread From: Knut Petersen @ 2011-05-12 8:24 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx [-- Attachment #1: Type: text/plain, Size: 168 bytes --] > Please do something like 'perf record -f -g -a x11perf -d :0 -worect10; > perf report | head -150' and paste the output. > -Chris > Attached find the perf log Knut [-- Attachment #2: perflog --] [-- Type: text/plain, Size: 10526 bytes --] # Events: 19K cycles # # Overhead Command Shared Object Symbol # ........ ............... ............................... ............................................................................................................................................................................................................................................................. # 32.09% Xorg libpixman-1.so.0.23.1 [.] pixman_op | --- pixman_op | |--99.80%-- pixman_region_union | | | |--99.95%-- damageRegionAppend | | damageDamageBox | | damagePolyRectangle | | ProcPolyRectangle | | Dispatch | | main | | __libc_start_main | --0.05%-- [...] --0.20%-- [...] 5.98% Xorg libc-2.11.3.so [.] __GI_memmove | --- __GI_memmove | |--93.46%-- pixman_region_union | damageRegionAppend | damageDamageBox | damagePolyRectangle | ProcPolyRectangle | Dispatch | main | __libc_start_main | |--5.14%-- Dispatch | main | __libc_start_main | |--1.22%-- WriteEventsToClient | DamageExtNotify | .L312 | damageRegionProcessPending | damagePolyRectangle | ProcPolyRectangle | Dispatch | main | __libc_start_main --0.18%-- [...] 3.25% Xorg [kernel.kallsyms] [k] __lock_acquire | --- __lock_acquire | |--98.72%-- lock_acquire | | | |--48.51%-- _raw_spin_lock_irqsave | | | | | |--45.74%-- add_wait_queue | | | __pollwait | | | | | | | |--89.24%-- unix_poll | | | | sock_poll | | | | do_select | | | | core_sys_select | | | | sys_select | | | | sysenter_do_call | | | | 0xb76ed424 | | | | Dispatch | | | | main | | | | __libc_start_main | | | | | | | |--4.40%-- n_tty_poll | | | | tty_poll | | | | do_select | | | | core_sys_select | | | | sys_select | | | | sysenter_do_call | | | | 0xb76ed424 | | | | Dispatch | | | | main | | | | __libc_start_main | | | | | | | |--3.56%-- datagram_poll | | | | sock_poll | | | | do_select | | | | core_sys_select | | | | sys_select | | | | sysenter_do_call | | | | 0xb76ed424 | | | | Dispatch | | | | main | | | | __libc_start_main | | | | | | | --2.81%-- drm_poll | | | do_select | | | core_sys_select | | | sys_select | | | sysenter_do_call | | | 0xb76ed424 | | | Dispatch | | | main | | | __libc_start_main | | | | | |--31.44%-- remove_wait_queue | | | poll_freewait | | | do_select | | | core_sys_select | | | sys_select | | | sysenter_do_call | | | 0xb76ed424 | | | Dispatch | | | main | | | __libc_start_main | | | | | |--6.96%-- skb_dequeue | | | unix_stream_recvmsg | | | sock_aio_read | | | do_sync_read | | | vfs_read | | | sys_read | | | sysenter_do_call | | | 0xb76ed424 | | | _XSERVTransRead | | | ReadRequestFromClient | | | Dispatch | | | main | | | __libc_start_main | | | | | |--6.55%-- __wake_up_sync_key | | | | | | | |--79.99%-- unix_write_space | | | | sock_wfree | | | | unix_destruct_scm | | | | skb_release_head_state | | | | __kfree_skb | | | | consume_skb | | | | unix_stream_recvmsg | | | | sock_aio_read | | | | do_sync_read | | | | vfs_read | | | | sys_read | | | | sysenter_do_call | | | | 0xb76ed424 | | | | _XSERVTransRead | | | | ReadRequestFromClient | | | | Dispatch | | | | main | | | | __libc_start_main | | | | | | | --20.01%-- sock_def_readable [-- Attachment #3: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-12 8:24 ` Knut Petersen @ 2011-05-12 8:55 ` Chris Wilson 2011-05-12 9:34 ` Knut Petersen 2011-05-13 9:24 ` Knut Petersen 0 siblings, 2 replies; 14+ messages in thread From: Chris Wilson @ 2011-05-12 8:55 UTC (permalink / raw) To: Knut Petersen; +Cc: intel-gfx On Thu, 12 May 2011 10:24:00 +0200, Knut Petersen <Knut_Petersen@t-online.de> wrote: > > > Please do something like 'perf record -f -g -a x11perf -d :0 -worect10; > > perf report | head -150' and paste the output. > > -Chris > > > Attached find the perf log Oh, damage. A compositing WM? If you turn off compositing, do you see similar performance levels to xorg-1.6? -Chris -- Chris Wilson, Intel Open Source Technology Centre ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-12 8:55 ` Chris Wilson @ 2011-05-12 9:34 ` Knut Petersen 2011-05-13 9:24 ` Knut Petersen 1 sibling, 0 replies; 14+ messages in thread From: Knut Petersen @ 2011-05-12 9:34 UTC (permalink / raw) To: intel-gfx > Oh, damage. A compositing WM? If you turn off compositing, do you see > similar performance levels to xorg-1.6? > -Chris > That makes difference .... 16.300 reps speed up to 1.280.000 reps ... 78.5 times faster. I think I will rerun the tests. cu, Knut ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-12 8:55 ` Chris Wilson 2011-05-12 9:34 ` Knut Petersen @ 2011-05-13 9:24 ` Knut Petersen 1 sibling, 0 replies; 14+ messages in thread From: Knut Petersen @ 2011-05-13 9:24 UTC (permalink / raw) To: Chris Wilson; +Cc: intel-gfx [-- Attachment #1: Type: text/plain, Size: 326 bytes --] > Oh, damage. A compositing WM? If you turn off compositing, do you see > similar performance levels to xorg-1.6? > -Chris > If "Composite" is disabled, the current X scores much better than the 1.6.5 server in most cases. But there are a few exceptions ... for the worst of those cases, I also attached a perf log. - Knut [-- Attachment #2: x11perfcomp --] [-- Type: text/plain, Size: 20937 bytes --] 1: x11perf-10605000-nocomposite 2: x11perf-11099001-nocomposite 1 2 Operation -------- ------ --------- 2630.0 0.12 100-pixel double-dashed ellipse 4180.0 0.14 100-pixel dashed ellipse 575000.0 0.23 Copy 10x10 from pixmap to pixmap 5850.0 0.34 500-pixel filled ellipse 2970.0 0.35 500-pixel solid circle 6250.0 0.35 Fill 300x300 trapezoid 149000.0 0.41 PutImage 10x10 square 3930.0 0.44 100-pixel wide double-dashed ellipse 189000.0 0.44 ShmPutImage 10x10 square 1570.0 0.46 ShmPutImage 500x500 square 9610.0 0.49 GetImage 10x10 square 21700.0 0.51 ShmPutImage 100x100 square 12600.0 0.63 QueryPointer 12600.0 0.65 GetProperty 220000.0 0.67 100x100 wide rectangle outline 83400.0 0.68 100x100 rectangle 477.0 0.69 PutImage 500x500 square 9100.0 0.71 PutImage 100x100 square 28700.0 0.73 500x500 rectangle outline 5570.0 0.75 500x500 wide rectangle outline 2140.0 0.79 100-pixel double-dashed circle 2550.0 0.81 500-pixel wide circle 1690000.0 0.82 100-pixel vertical line segment 3500.0 0.82 500-pixel wide ellipse 3430.0 0.85 100-pixel dashed circle 163000.0 0.85 Fill 1x1 equivalent triangle 152000.0 0.86 Fill 1x1 trapezoid 139000.0 0.88 Copy 10x10 from window to pixmap 137000.0 0.91 Composite 10x10 from pixmap to window 1930.0 0.91 GetImage XY 10x10 square 21300.0 0.93 500-pixel circle 138000.0 0.93 Copy 10x10 from pixmap to window 1300000.0 0.93 Move window via parent (100 kids) 1370000.0 0.93 Move window via parent (200 kids) 130000.0 0.95 10-pixel partial ellipse 107000.0 0.95 Char in 80-char rgb core line (Charter 10) 831000.0 0.96 1-pixel circle 16700.0 0.96 Fill 100x100 aa pre-added trapezoid 1590.0 0.96 Fill 100x100 aa trap 1460.0 0.96 Fill 100x100 aa trap with 4 bit alpha 513.0 0.96 Fill 300x300 aa trap 499.0 0.96 Fill 300x300 aa trap with 4 bit alpha 74.9 0.96 Fill 300x300 tiled trapezoid (17x15 tile) 153000.0 0.97 Fill 10x10 aa pre-added trapezoid 4780.0 0.97 Fill 300x300 aa pre-added trapezoid 12.0 0.97 ShmPutImage XY 100x100 square 783.0 0.98 Fill 100x100 tiled trapezoid (161x145 tile) 927000.0 0.98 Fill 1x1 aa pre-added trapezoid 12.4 0.98 PutImage XY 100x100 square 1230.0 0.98 PutImage XY 10x10 square 1110.0 0.98 ShmPutImage XY 10x10 square 29000.0 0.99 100x100 tiled rectangle (161x145 tile) 23900.0 0.99 100x100 tiled rectangle (17x15 tile) 30100.0 0.99 100x100 tiled rectangle (216x208 tile) 34800000.0 0.99 1-pixel solid circle 885.0 0.99 500x500 tiled rectangle (161x145 tile) 691.0 0.99 500x500 tiled rectangle (17x15 tile) 960.0 0.99 500x500 tiled rectangle (216x208 tile) 274000.0 0.99 Char in 30-char aa line (Charter 24) 275000.0 0.99 Char in 30-char a line (Charter 24) 20400.0 0.99 Copy 100x100 from pixmap to pixmap 599.0 0.99 Copy 100x100 n-bit deep plane 870.0 0.99 Copy 500x500 from pixmap to pixmap 24.3 0.99 Copy 500x500 n-bit deep plane 120000.0 0.99 Fill 1x1 aa trap 1090.0 0.99 Fill 2x100 aa trap 10700.0 0.99 Fill 2x10 aa trap 91200.0 0.99 Fill 2x1 aa trap 322000.0 1.00 100-pixel dashed line 307000.0 1.00 100-pixel double-dashed line 275000.0 1.00 100-pixel double-dashed segment 307000.0 1.00 100-pixel line 277000.0 1.00 100-pixel line segment 309000.0 1.00 100-pixel line segment (2 kids) 24600000.0 1.00 1-pixel line 2430000.0 1.00 500-pixel horizontal line segment 56500.0 1.00 500-pixel line segment 2400000.0 1.00 500-pixel vertical line segment 2440.0 1.00 500x500 rectangle 20400.0 1.00 Composite 100x100 from pixmap to window 20100.0 1.00 Composite 100x100 from window to window 866.0 1.00 Composite 500x500 from pixmap to window 875.0 1.00 Composite 500x500 from window to window 20500.0 1.00 Copy 100x100 from pixmap to window 20600.0 1.00 Copy 100x100 from window to pixmap 20100.0 1.00 Copy 100x100 from window to window 866.0 1.00 Copy 500x500 from pixmap to window 872.0 1.00 Copy 500x500 from window to pixmap 875.0 1.00 Copy 500x500 from window to window 661.0 1.00 Fill 100x100 tiled trapezoid (17x15 tile) 67.1 1.00 Fill 300x300 tiled trapezoid (4x4 tile) 25.9 1.00 GetImage XY 100x100 square 1.0 1.00 GetImage XY 500x500 square 0.5 1.00 PutImage XY 500x500 square 1240.0 1.00 Scroll 500x500 pixels 0.5 1.00 ShmPutImage XY 500x500 square 289000.0 1.01 100-pixel dashed segment 292000.0 1.01 100-pixel line segment (1 kid) 13400.0 1.01 Fill 10x10 aa trap 28500.0 1.01 Scroll 100x100 pixels 3230000.0 1.02 10-pixel line 193000.0 1.02 Char16 in 23-char image line (k24) 271000.0 1.02 Char16 in 23-char line (k24) 7270.0 1.02 Fill 10x10 aa trap with 4 bit alpha 2350000.0 1.03 10-pixel dashed segment 2200000.0 1.03 10-pixel line segment 61800.0 1.03 500-pixel line 497000.0 1.03 Char16 in 40-char image line (k14) 118000.0 1.03 Char in 80-char aa core line (Charter 10) 118000.0 1.03 Char in 80-char a core line (Charter 10) 353.0 1.03 Fill 2x300 aa trap 1130000.0 1.03 Move window via parent (75 kids) 320000.0 1.04 100-pixel line segment (3 kids) 649000.0 1.04 Char16 in 40-char line (k14) 507000.0 1.04 Char in 30-char image line (TR 24) 1480000.0 1.04 Char in 80-char image line (6x13) 568.0 1.04 Fill 100x100 tiled trapezoid (4x4 tile) 15400.0 1.04 Fill 1x1 aa trap with 4 bit alpha 1090000.0 1.05 Char in 60-char image line (9x15) 1290000.0 1.05 Char in 70-char image line (8x13) 1720000.0 1.05 Char in 80-char image line (TR 10) 58900.0 1.05 Hide/expose window via popup (4 kids) 1810000.0 1.05 Moved unmapped window (100 kids) 1700000.0 1.05 Resize unmapped window (200 kids) 327000.0 1.06 Char16 in 7/14/7 line (k14, k24) 43200.0 1.06 Copy 10x10 n-bit deep plane 3160000.0 1.07 10-pixel dashed line 2050000.0 1.07 Char in 80-char line (6x13) 270000.0 1.07 Fill 1x1 aa trap with 1 bit alpha 1830000.0 1.07 Moved unmapped window (50 kids) 21600000.0 1.08 1-pixel line segment 21200.0 1.08 Char in 30-char rgb core line (Charter 24) 1850000.0 1.08 Moved unmapped window (16 kids) 1830000.0 1.08 Moved unmapped window (4 kids) 1820000.0 1.08 Moved unmapped window (75 kids) 839000.0 1.09 Char in 30-char line (TR 24) 1620000.0 1.09 Char in 60-char line (9x15) 1870000.0 1.09 Char in 70-char line (8x13) 2340000.0 1.09 Char in 80-char line (TR 10) 1810000.0 1.09 Moved unmapped window (200 kids) 25600.0 1.09 Move window (25 kids) 1720000.0 1.09 Resize unmapped window (16 kids) 1720000.0 1.09 Resize unmapped window (25 kids) 1730000.0 1.09 Resize unmapped window (4 kids) 1220000.0 1.10 1x1 tiled rectangle (161x145 tile) 1220000.0 1.10 1x1 tiled rectangle (17x15 tile) 1210000.0 1.10 1x1 tiled rectangle (4x4 tile) 725.0 1.10 Fill 300x300 aa trapezoid 26200.0 1.10 Move window (16 kids) 16600.0 1.10 Move window (200 kids) 1700000.0 1.10 Resize unmapped window (75 kids) 1970000.0 1.10 Unmap window via parent (200 kids) 1750000.0 1.10 Unmap window via parent (50 kids) 1120000.0 1.11 10x10 tiled rectangle (216x208 tile) 973000.0 1.11 Circulate Unmapped window (200 kids) 1780000.0 1.11 Moved unmapped window (25 kids) 20900.0 1.11 Move window (100 kids) 22100.0 1.11 Move window (75 kids) 80500.0 1.12 Char in 80-char rgb core line (Courier 12) 502000.0 1.12 Destroy window via parent (200 kids) 1650000.0 1.12 Resize unmapped window (100 kids) 125000.0 1.13 10-pixel ellipse 4780000.0 1.13 10-pixel horizontal line segment 1070000.0 1.13 10x10 tiled rectangle (161x145 tile) 1180000.0 1.13 1x1 tiled rectangle (216x208 tile) 23500.0 1.13 Move window (50 kids) 1670000.0 1.13 Resize unmapped window (50 kids) 547000.0 1.14 10x10 tiled rectangle (17x15 tile) 37600.0 1.14 Circulate window (4 kids) 22600.0 1.15 Char in 30-char aa core line (Charter 24) 2270000.0 1.15 Circulate Unmapped window (75 kids) 37400.0 1.15 Fill 10x10 tiled trapezoid (4x4 tile) 28100.0 1.15 Move window (4 kids) 87600.0 1.16 Char in 80-char aa core line (Courier 12) 1970000.0 1.16 Circulate Unmapped window (100 kids) 3180000.0 1.16 Circulate Unmapped window (25 kids) 2710000.0 1.16 Circulate Unmapped window (50 kids) 42800.0 1.16 Fill 10x10 tiled trapezoid (216x208 tile) 1980000.0 1.16 Unmap window via parent (100 kids) 3490.0 1.17 100x100 tiled rectangle (4x4 tile) 144000.0 1.17 10x10 tiled rectangle (4x4 tile) 10200.0 1.17 500x50 wide vertical line segment 711000.0 1.17 Char in 80-char rgb line (Courier 12) 90.5 1.17 Fill 300x300 tiled trapezoid (161x145 tile) 557000.0 1.17 Move window via parent (25 kids) 110000.0 1.17 Move window via parent (4 kids) 904000.0 1.17 Move window via parent (50 kids) 102.0 1.18 500x500 tiled rectangle (4x4 tile) 21900.0 1.18 Char in 30-char a core line (Charter 24) 87400.0 1.18 Char in 80-char a core line (Courier 12) 4350.0 1.18 Fill 100x100 aa trapezoid 42200.0 1.18 Fill 10x10 tiled trapezoid (161x145 tile) 39500.0 1.18 Fill 10x10 tiled trapezoid (17x15 tile) 388000.0 1.18 Move window via parent (16 kids) 23400.0 1.18 Resize window (200 kids) 1910000.0 1.18 Unmap window via parent (75 kids) 33800.0 1.19 100-pixel circle 807.0 1.19 Fill 100x100 tiled trapezoid (216x208 tile) 15100.0 1.19 Fill 100x100 trapezoid 76000.0 1.19 Map window via parent (4 kids) 4140.0 1.20 100-pixel wide double-dashed circle 2200.0 1.20 500-pixel ellipse 3370000.0 1.20 Circulate Unmapped window (16 kids) 1690000.0 1.21 Char in 20/40/20 line (6x13, TR 10) 8160000.0 1.21 X protocol NoOperation 1750.0 1.22 100-pixel wide dashed ellipse 191000.0 1.22 Char in 30-char rgb line (Charter 24) 32400.0 1.22 Resize window (75 kids) 10200.0 1.23 100-pixel ellipse 15300.0 1.23 100-pixel partial ellipse 4630.0 1.23 500x50 wide line 30100.0 1.23 Resize window (100 kids) 92.0 1.24 Fill 300x300 tiled trapezoid (216x208 tile) 777.0 1.25 GetImage 100x100 square 96800.0 1.25 Hide/expose window via popup (16 kids) 714000.0 1.26 Create unmapped window (200 kids) 34900.0 1.26 Resize window (50 kids) 40100.0 1.27 Resize window (16 kids) 1290000.0 1.27 Unmap window via parent (25 kids) 10000.0 1.29 500x50 wide horizontal line segment 38100.0 1.29 Resize window (25 kids) 4030000.0 1.30 100-pixel horizontal line segment 3530000.0 1.30 Circulate Unmapped window (4 kids) 25300.0 1.30 Circulate window (16 kids) 981000.0 1.30 Unmap window via parent (16 kids) 22200.0 1.31 Circulate window (100 kids) 19100.0 1.31 Circulate window (200 kids) 43700.0 1.31 Resize window (4 kids) 972000.0 1.32 10x10 wide rectangle outline 23200.0 1.32 Circulate window (50 kids) 22600.0 1.32 Circulate window (75 kids) 13500.0 1.32 Fill 100x100 64-gon (Convex) 24200.0 1.33 Circulate window (25 kids) 365000.0 1.33 Destroy window via parent (16 kids) 12200.0 1.34 Fill 100x100 equivalent triangle 775000.0 1.35 Char in 80-char aa line (Courier 12) 777000.0 1.35 Char in 80-char a line (Courier 12) 26100.0 1.36 10-pixel wide partial ellipse 1840000.0 1.36 10x10 rectangle 711000.0 1.36 Create unmapped window (100 kids) 698000.0 1.37 Create unmapped window (50 kids) 35300.0 1.38 100-pixel partial circle 112000.0 1.38 Map window via parent (16 kids) 689000.0 1.39 Create unmapped window (25 kids) 500000.0 1.39 Destroy window via parent (75 kids) 12700.0 1.40 Fill 100x100 64-gon (Complex) 98700.0 1.40 Hide/expose window via popup (25 kids) 109000.0 1.40 Hide/expose window via popup (50 kids) 114000.0 1.41 Hide/expose window via popup (75 kids) 15600.0 1.43 100-pixel solid circle 112000.0 1.44 Create and map subwindows (200 kids) 113000.0 1.44 Create and map subwindows (50 kids) 114000.0 1.44 Create and map subwindows (75 kids) 671000.0 1.44 Create unmapped window (75 kids) 123000.0 1.45 10-pixel partial circle 112000.0 1.45 Create and map subwindows (25 kids) 182000.0 1.45 Destroy window via parent (4 kids) 113000.0 1.45 Hide/expose window via popup (100 kids) 118000.0 1.45 Hide/expose window via popup (200 kids) 322000.0 1.45 Unmap window via parent (4 kids) 114000.0 1.46 Create and map subwindows (100 kids) 109000.0 1.46 Create and map subwindows (16 kids) 476000.0 1.46 Destroy window via parent (50 kids) 655000.0 1.47 Create unmapped window (16 kids) 763000.0 1.48 Char in 80-char rgb line (Charter 10) 127000.0 1.48 Map window via parent (75 kids) 140000.0 1.51 Change graphics context 54600.0 1.52 10x1 wide vertical line segment 128000.0 1.52 Map window via parent (100 kids) 81500.0 1.53 Copy 10x10 from window to window 481000.0 1.53 Destroy window via parent (100 kids) 81600.0 1.53 Scroll 10x10 pixels 86200.0 1.54 Create and map subwindows (4 kids) 12500.0 1.55 100-pixel wide ellipse 80800.0 1.55 Composite 10x10 from window to window 121000.0 1.55 Map window via parent (50 kids) 394000.0 1.56 Destroy window via parent (25 kids) 138000.0 1.56 Fill 1x1 tiled trapezoid (17x15 tile) 111000.0 1.56 Map window via parent (25 kids) 137000.0 1.57 Fill 1x1 tiled trapezoid (4x4 tile) 130000.0 1.57 Map window via parent (200 kids) 551000.0 1.58 Create unmapped window (4 kids) 136000.0 1.58 Fill 1x1 tiled trapezoid (161x145 tile) 9850.0 1.60 Fill 100x100 equivalent complex polygons 53400.0 1.61 10x1 wide horizontal line segment 132000.0 1.63 Fill 1x1 tiled trapezoid (216x208 tile) 23500.0 1.69 10-pixel wide partial circle 105000.0 1.71 10-pixel circle 1470000.0 1.72 1x1 stippled rectangle (8x8 stipple) 1420000.0 1.73 1x1 opaque stippled rectangle (161x145 stipple) 53400.0 1.76 100x100 rectangle outline 1430000.0 1.77 1x1 stippled rectangle (161x145 stipple) 1430000.0 1.77 1x1 stippled rectangle (17x15 stipple) 1420000.0 1.78 1x1 opaque stippled rectangle (17x15 stipple) 773000.0 1.80 Char in 80-char a line (Charter 10) 768000.0 1.81 Char in 80-char aa line (Charter 10) 1400000.0 1.82 1x1 opaque stippled rectangle (8x8 stipple) 185.0 1.83 500x500 opaque stippled rectangle (17x15 stipple) 14000.0 1.86 Fill 10x10 aa trapezoid 174000.0 1.90 Fill 1x1 stippled trapezoid (17x15 stipple) 173000.0 1.92 Fill 1x1 opaque stippled trapezoid (8x8 stipple) 173000.0 1.93 Fill 1x1 opaque stippled trapezoid (161x145 stipple) 173000.0 1.94 Fill 1x1 opaque stippled trapezoid (17x15 stipple) 4140.0 1.95 100x100 opaque stippled rectangle (17x15 stipple) 134000.0 1.96 Fill 10x10 aa trap with 1 bit alpha 172000.0 1.96 Fill 1x1 stippled trapezoid (8x8 stipple) 1840000.0 1.97 10-pixel vertical line segment 173000.0 1.97 Fill 1x1 stippled trapezoid (161x145 stipple) 5780.0 1.99 100-pixel wide partial ellipse 1830000.0 1.99 1x1 rectangle 86400.0 1.99 Fill 10x10 stippled trapezoid (161x145 stipple) 86000.0 2.01 Fill 10x10 opaque stippled trapezoid (161x145 stipple) 14700.0 2.03 Fill 1x1 aa trapezoid 1840000.0 2.04 Dot 67700.0 2.04 Fill 10x10 stippled trapezoid (8x8 stipple) 77700.0 2.05 Fill 10x10 opaque stippled trapezoid (17x15 stipple) 74600.0 2.05 Fill 10x10 stippled trapezoid (17x15 stipple) 5250.0 2.11 Fill 100x100 stippled trapezoid (161x145 stipple) 28.1 2.11 GetImage 500x500 square 5760.0 2.12 Fill 100x100 opaque stippled trapezoid (161x145 stipple) 570.0 2.12 Fill 300x300 opaque stippled trapezoid (17x15 stipple) 69000.0 2.13 Fill 10x10 opaque stippled trapezoid (8x8 stipple) 18500.0 2.16 100-pixel filled ellipse 4040.0 2.18 Fill 100x100 opaque stippled trapezoid (17x15 stipple) 17600.0 2.19 100-pixel fill chord partial circle 708.0 2.19 Fill 300x300 stippled trapezoid (161x145 stipple) 2980.0 2.20 Fill 100x100 stippled trapezoid (17x15 stipple) 384.0 2.21 Fill 300x300 stippled trapezoid (17x15 stipple) 53800.0 2.23 10x10 rectangle outline 301.0 2.24 500x500 stippled rectangle (161x145 stipple) 869.0 2.24 Fill 300x300 opaque stippled trapezoid (161x145 stipple) 16300.0 2.30 100-pixel fill slice partial circle 7100.0 2.31 100x100 opaque stippled rectangle (161x145 stipple) 114.0 2.31 500x500 opaque stippled rectangle (8x8 stipple) 6560.0 2.32 100x100 stippled rectangle (161x145 stipple) 345.0 2.33 500x500 opaque stippled rectangle (161x145 stipple) 106000.0 2.33 Copy 10x10 1-bit deep plane 2260.0 2.33 Fill 100x100 stippled trapezoid (8x8 stipple) 404.0 2.35 Copy 500x500 1-bit deep plane 2570.0 2.44 Fill 100x100 opaque stippled trapezoid (8x8 stipple) 1370.0 2.45 100-pixel wide dashed circle 328.0 2.47 Fill 300x300 opaque stippled trapezoid (8x8 stipple) 280.0 2.51 Fill 300x300 stippled trapezoid (8x8 stipple) 53800.0 2.57 10-pixel fill chord partial ellipse 33300.0 2.57 10x1 wide line 7010.0 2.61 Copy 100x100 1-bit deep plane 2540.0 2.63 100x100 opaque stippled rectangle (8x8 stipple) 172.0 2.63 500x500 stippled rectangle (17x15 stipple) 6520.0 2.64 100-pixel wide partial circle 48400.0 2.64 10-pixel fill slice partial ellipse 3910.0 2.79 100x100 stippled rectangle (17x15 stipple) 224000.0 2.89 10x10 opaque stippled rectangle (161x145 stipple) 30900.0 2.92 10-pixel wide ellipse 9940.0 2.98 100-pixel wide circle 113.0 3.03 500x500 stippled rectangle (8x8 stipple) 14200.0 3.06 100x10 wide vertical line segment 25600.0 3.11 Fill 10x10 64-gon (Complex) 14200.0 3.15 100x10 wide horizontal line segment 41600.0 3.22 10-pixel filled ellipse 2560.0 3.25 100x100 stippled rectangle (8x8 stipple) 176000.0 3.30 10x10 opaque stippled rectangle (17x15 stipple) 35100.0 3.36 10-pixel fill slice partial circle 38100.0 3.39 10-pixel fill chord partial circle 25400.0 3.44 Fill 10x10 64-gon (Convex) 143000.0 3.47 10x10 opaque stippled rectangle (8x8 stipple) 22100.0 3.49 100-pixel fill chord partial ellipse 231000.0 3.56 10x10 stippled rectangle (161x145 stipple) 20100.0 3.68 100-pixel fill slice partial ellipse 3550.0 3.80 100x10 wide double-dashed line 156000.0 3.83 10x10 stippled rectangle (17x15 stipple) 26100.0 3.87 Fill 10x10 equivalent complex polygon 10700.0 3.93 100x10 wide line 3040.0 4.01 100x10 wide dashed line 27200.0 4.15 Fill 10x10 equivalent triangle 28200.0 4.22 Fill 10x10 trapezoid 27100.0 4.24 10-pixel wide circle 29500.0 4.44 10-pixel solid circle 114000.0 4.49 10x10 stippled rectangle (8x8 stipple) 29300.0 4.78 Fill 100x100 aa trap with 1 bit alpha 5270.0 11.33 Fill 300x300 aa trap with 1 bit alpha [-- Attachment #3: perflog-ddellipse100 --] [-- Type: text/plain, Size: 10526 bytes --] # Events: 19K cycles # # Overhead Command Shared Object Symbol # ........ ............... ............................... ............................................................................................................................................................................................................................................................. # 32.09% Xorg libpixman-1.so.0.23.1 [.] pixman_op | --- pixman_op | |--99.80%-- pixman_region_union | | | |--99.95%-- damageRegionAppend | | damageDamageBox | | damagePolyRectangle | | ProcPolyRectangle | | Dispatch | | main | | __libc_start_main | --0.05%-- [...] --0.20%-- [...] 5.98% Xorg libc-2.11.3.so [.] __GI_memmove | --- __GI_memmove | |--93.46%-- pixman_region_union | damageRegionAppend | damageDamageBox | damagePolyRectangle | ProcPolyRectangle | Dispatch | main | __libc_start_main | |--5.14%-- Dispatch | main | __libc_start_main | |--1.22%-- WriteEventsToClient | DamageExtNotify | .L312 | damageRegionProcessPending | damagePolyRectangle | ProcPolyRectangle | Dispatch | main | __libc_start_main --0.18%-- [...] 3.25% Xorg [kernel.kallsyms] [k] __lock_acquire | --- __lock_acquire | |--98.72%-- lock_acquire | | | |--48.51%-- _raw_spin_lock_irqsave | | | | | |--45.74%-- add_wait_queue | | | __pollwait | | | | | | | |--89.24%-- unix_poll | | | | sock_poll | | | | do_select | | | | core_sys_select | | | | sys_select | | | | sysenter_do_call | | | | 0xb76ed424 | | | | Dispatch | | | | main | | | | __libc_start_main | | | | | | | |--4.40%-- n_tty_poll | | | | tty_poll | | | | do_select | | | | core_sys_select | | | | sys_select | | | | sysenter_do_call | | | | 0xb76ed424 | | | | Dispatch | | | | main | | | | __libc_start_main | | | | | | | |--3.56%-- datagram_poll | | | | sock_poll | | | | do_select | | | | core_sys_select | | | | sys_select | | | | sysenter_do_call | | | | 0xb76ed424 | | | | Dispatch | | | | main | | | | __libc_start_main | | | | | | | --2.81%-- drm_poll | | | do_select | | | core_sys_select | | | sys_select | | | sysenter_do_call | | | 0xb76ed424 | | | Dispatch | | | main | | | __libc_start_main | | | | | |--31.44%-- remove_wait_queue | | | poll_freewait | | | do_select | | | core_sys_select | | | sys_select | | | sysenter_do_call | | | 0xb76ed424 | | | Dispatch | | | main | | | __libc_start_main | | | | | |--6.96%-- skb_dequeue | | | unix_stream_recvmsg | | | sock_aio_read | | | do_sync_read | | | vfs_read | | | sys_read | | | sysenter_do_call | | | 0xb76ed424 | | | _XSERVTransRead | | | ReadRequestFromClient | | | Dispatch | | | main | | | __libc_start_main | | | | | |--6.55%-- __wake_up_sync_key | | | | | | | |--79.99%-- unix_write_space | | | | sock_wfree | | | | unix_destruct_scm | | | | skb_release_head_state | | | | __kfree_skb | | | | consume_skb | | | | unix_stream_recvmsg | | | | sock_aio_read | | | | do_sync_read | | | | vfs_read | | | | sys_read | | | | sysenter_do_call | | | | 0xb76ed424 | | | | _XSERVTransRead | | | | ReadRequestFromClient | | | | Dispatch | | | | main | | | | __libc_start_main | | | | | | | --20.01%-- sock_def_readable [-- Attachment #4: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-11 14:46 ` Knut Petersen 2011-05-11 17:52 ` Chris Wilson @ 2011-05-11 19:49 ` Adam Jackson 2011-05-11 21:22 ` Knut Petersen 1 sibling, 1 reply; 14+ messages in thread From: Adam Jackson @ 2011-05-11 19:49 UTC (permalink / raw) To: Knut Petersen; +Cc: intel-gfx [-- Attachment #1.1: Type: text/plain, Size: 5319 bytes --] On Wed, 2011-05-11 at 16:46 +0200, Knut Petersen wrote: > Yes, I made some mistakes during my first measurements. > > Below find better results. They are made on the same machine, > with the same kernel, at the same speed, with the same x11perf > program, absolutely nothing changed. You don't mention whether the 2d driver varies; I assume it does at least to the extent of rebuilding for new ABI. Or libdrm, although that's really a 1% kind of thing. > I think the numbers below are quite interesting ... I still wager they're more about the environment than about the driver proper, there's just too many weird things going on in your results. For example: > 198000.0 0.27 ShmPutImage 10x10 square > 1570.0 0.46 ShmPutImage 500x500 square > 21700.0 0.61 ShmPutImage 100x100 square This is essentially a memcpy benchmark. Something has to be very wrong for that much variation to happen, and my guess would be something like failing to inline memcpy or pick sufficiently macho optimized versions. I'd be interested to see what your CFLAGS from build.sh ended up being, relative to what opensuse gives for 'rpm --eval "%{optflags}"'. One cool thing you can do from memcpy benchmarks like this is extrapolate a bandwidth number. Your fast numbers are (small tests to big) 75.5, 828, and 1497 MB/s. Normally one expects some growth in those numbers for bigger tests, but typically the jump from 10x10 to 100x100 is a bit larger than the jump from 100x100 to 500x500. So that hints that small-work tests are being choked somehow. Recall that x11perf does a 1-pixel GetImage periodically in order to guarantee that results actually hit the framebuffer instead of just being queued in the command stream, so round-trip performance with the X server does actually matter. More than that, small-work requests (which take less time) would be more strongly dominated by round-trip speed than large-work requests. Given that: > 15400.0 0.54 GetProperty > 15500.0 0.54 QueryPointer is very telling. Those requests do essentially no work, but they are round-trips, and their throughput is thus bounded mostly by how long it takes the scheduler to ping-pong between x11perf and the server. A factor of ~2 drop would lead me to suspect something like one kernel scheduling the processes on different cores, and the other both on the same core; two processes splitting 1CPU time with maybe a little cache warmth between them would intuitively be about half as fast as two processes each with their own CPU. Empirical evidence: On the Ironlake laptop on my desk (kernel 2.6.38.3-18.fc15), if I use taskset to bind the X server to CPU0, running "x11perf -prop -pointer" bound to CPU0 gives: 300000 trep @ 0.0322 msec ( 31100.0/sec): QueryPointer 300000 trep @ 0.0321 msec ( 31200.0/sec): GetProperty x11perf bound to CPU3 gives: 600000 trep @ 0.0193 msec ( 51900.0/sec): QueryPointer 600000 trep @ 0.0192 msec ( 52200.0/sec): GetProperty And running it unbound (letting the scheduler decide) gives: 600000 trep @ 0.0198 msec ( 50600.0/sec): QueryPointer 600000 trep @ 0.0208 msec ( 48000.0/sec): GetProperty I'd be curious to see how you fare with experimenting with taskset. One set of results that's a little confusing, and thus probably in the end most enlightening: > 553000.0 0.24 Copy 10x10 from pixmap to pixmap > 140000.0 0.86 Copy 10x10 from window to pixmap > 143000.0 0.88 Copy 10x10 from pixmap to window > 867.0 0.99 Copy 500x500 from pixmap to pixmap > 870.0 1.00 Copy 500x500 from window to window > 19800.0 1.01 Copy 100x100 from pixmap to pixmap > 19900.0 1.01 Copy 100x100 from pixmap to window > 20000.0 1.01 Copy 100x100 from window to pixmap > 19600.0 1.01 Copy 100x100 from window to window > 851.0 1.01 Copy 500x500 from pixmap to window > 849.0 1.02 Copy 500x500 from window to pixmap > 81700.0 1.52 Copy 10x10 from window to window This _mostly_ makes sense. These are all just varying calls to XCopyArea, which does not have a reply. The medium and large ops are approximately identical before and after. The 0.8x results make sense in the context of scheduling funniness for small-work requests. But the two outliers are perplexing. I would guess that copywinwin10 got faster due to some optimization surrounding buffer reuse or flush reduction (you're always working on the same buffer, so you can do less work), and that copypixpix10 is operating wholly in host memory for some reason and therefore hitting the same kind of memcpy issue as in your ShmPutImage results. I'll also note that the paths where you're losing hardest are, in the majority, things that the driver makes no attempt to accelerate (anything with the word "tiled" or "stippled" involved, for example). I would tend to chalk that up to something like gcc -O0 before anything else since you're primarily measuring the efficiency of the software renderer. I'm actually pretty pleased with the results you've shown, 10% or better speedup for basically all text ops, about half of window management ops, and almost all window exposure ops. - ajax [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-11 19:49 ` Adam Jackson @ 2011-05-11 21:22 ` Knut Petersen 2011-05-12 13:42 ` Adam Jackson 0 siblings, 1 reply; 14+ messages in thread From: Knut Petersen @ 2011-05-11 21:22 UTC (permalink / raw) To: Adam Jackson; +Cc: intel-gfx As I do have only a few minutes now, a few comments: 1: The complete trees are compared, all modules/libraries are either old or new. No debug-versions. 2: Speculating about cores is definitely wrong -- the Pentium M Dothan definitely is a single core cpu. 3. There often is a "choked most" (1) -- "choked least" (10) -- "choked a bit more again" (100,500) result: 1450000.0 0.50 1x1 stippled rectangle (8x8 stipple) 134000.0 1.11 10x10 stippled rectangle (8x8 stipple) 2540.0 1.05 100x100 stippled rectangle (8x8 stipple) 110.0 0.95 500x500 stippled rectangle (8x8 stipple) Heavy per call impact of factor A on those small requests, light impact of a factor B with growing numbers? A = compiler / library overhead? Yes, there is > 15400.0 0.54 GetProperty > 15500.0 0.54 QueryPointer but we also see 8150000.0 1.21 X protocol NoOperation 4. No, it's not the kernel. I did a) boot b) x11perf on old X c) x11perf on new X d) reboot e) x11perf on new X f) x11perf on old X and saw only very marginal differences between those two runs. 5. Yes, I do agree to that: > I'm actually pretty pleased with the results you've shown, 10% > or better speedup for basically all text ops, about half of window > management ops, and almost all window exposure ops. 6. More later. cu, Knut ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: X11 performance regressions 2011-05-11 21:22 ` Knut Petersen @ 2011-05-12 13:42 ` Adam Jackson 0 siblings, 0 replies; 14+ messages in thread From: Adam Jackson @ 2011-05-12 13:42 UTC (permalink / raw) To: Knut Petersen; +Cc: intel-gfx [-- Attachment #1.1: Type: text/plain, Size: 417 bytes --] On Wed, 2011-05-11 at 23:22 +0200, Knut Petersen wrote: > Yes, there is > > 15400.0 0.54 GetProperty > > 15500.0 0.54 QueryPointer > but we also see > > 8150000.0 1.21 X protocol NoOperation NoOp isn't a round trip, it does not generate a reply. That test measures how fast the X server can zip around its own main loop, not how fast it can interact with clients. - ajax [-- Attachment #1.2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 198 bytes --] [-- Attachment #2: Type: text/plain, Size: 159 bytes --] _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-05-13 9:24 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-05-08 18:22 X11 performance regressions Knut Petersen 2011-05-09 16:53 ` Adam Jackson 2011-05-09 21:43 ` Chris Wilson 2011-05-11 14:46 ` Knut Petersen 2011-05-11 17:52 ` Chris Wilson 2011-05-12 7:19 ` Knut Petersen 2011-05-12 7:38 ` Chris Wilson 2011-05-12 8:24 ` Knut Petersen 2011-05-12 8:55 ` Chris Wilson 2011-05-12 9:34 ` Knut Petersen 2011-05-13 9:24 ` Knut Petersen 2011-05-11 19:49 ` Adam Jackson 2011-05-11 21:22 ` Knut Petersen 2011-05-12 13:42 ` Adam Jackson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.