x86 memcpy performance

* x86 memcpy performance
@ 2011-08-12 17:59 melwyn lobo
  2011-08-12 18:33 ` Andi Kleen
  2011-08-12 19:52 ` Ingo Molnar
  0 siblings, 2 replies; 40+ messages in thread
From: melwyn lobo @ 2011-08-12 17:59 UTC (permalink / raw)
  To: linux-kernel

Hi All,
Our Video recorder application uses memcpy for every frame. About 2KB
data every frame on Intel® Atom™ Z5xx processor.
With default 2.6.35 kernel we got 19.6 fps. But it seems kernel
implemented memcpy is suboptimal, because when we replaced
with an optmized one (using ssse3, exact patches are currently being
finalized) ew obtained 22fps a gain of 12.2 %.
C0 residency also reduced from 75% to 67%. This means power benefits too.
My questions:
1. Is kernel memcpy profiled for optimal performance.
2. Does the default kernel configuration for i386 include the best
memcpy implementation (AMD 3DNOW, __builtin_memcpy .... etc)

Any suggestions, prior experience on this is welcome.

Thanks,
M.

^ permalink raw reply	[flat|nested] 40+ messages in thread