On Mon, Aug 15, 2011 at 10:34:35PM -0400, Valdis.Kletnieks@vt.edu wrote: > On Sun, 14 Aug 2011 11:59:10 +0200, Borislav Petkov said: > > > Benchmarking with 10000 iterations, average results: > > size XM MM speedup > > 119 540.58 449.491 0.8314969419 > > > 12273 2307.86 4042.88 1.751787902 > > 13924 2431.8 4224.48 1.737184756 > > 14335 2469.4 4218.82 1.708440514 > > 15018 2675.67 1904.07 0.711622886 > > 16374 2989.75 5296.26 1.771470902 > > 24564 4262.15 7696.86 1.805863077 > > 27852 4362.53 3347.72 0.7673805572 > > 28672 5122.8 7113.14 1.388524413 > > 30033 4874.62 8740.04 1.792967931 > > The numbers for 15018 and 27852 are *way* odd for the MM case. I don't feel > really good about this till we understand what happened for those two cases. Yep. > Also, anytime I see "10000 iterations", I ask myself if the benchmark > rigging took proper note of hot/cold cache issues. That *may* explain > the two oddball results we see above - but not knowing more about how > it was benched, it's hard to say. Yeah, the more scrutiny this gets the better. So I've cleaned up my setup and have attached it. xm_mem.c does the benchmarking and in bench_memcpy() there's the sse_memcpy call which is the SSE memcpy implementation using inline asm. It looks like gcc produces pretty crappy code here because if I replace the sse_memcpy call with xm_memcpy() from xm_memcpy.S - this is the same function but in pure asm - I get much better numbers, sometimes even over 2x. It all depends on the alignment of the buffers though. Also, those numbers don't include the context saving/restoring which the kernel does for us. 7491 1509.89 2346.94 1.554378381 8170 2166.81 2857.78 1.318890326 12277 2659.03 4179.31 1.571744176 13907 2571.24 4125.7 1.604558427 14319 2638.74 5799.67 2.19789466 <---- 14993 2752.42 4413.85 1.603625603 16371 3479.11 5562.65 1.59887055 So please take a look and let me know what you think. Thanks. -- Regards/Gruss, Boris. Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach GM: Alberto Bozzo Reg: Dornach, Landkreis Muenchen HRB Nr. 43632 WEEE Registernr: 129 19551