linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* checksumming with mmx, comment in arch/i386/lib/mmx.c
@ 2003-02-11 18:37 nick black
  2003-02-11 19:06 ` Valdis.Kletnieks
  0 siblings, 1 reply; 2+ messages in thread
From: nick black @ 2003-02-11 18:37 UTC (permalink / raw)
  To: linux-kernel

i want to speed up my product's checksum verification code, and was
pondering the use of mmx (ip_fast_csum as implemented by cwik and
gulbrandsen from asm-i386/checksum.h is fast enough for my needs, but i
don't want to violate the gpl 8) ).

i'm refreshing myself on mmx currently, but noticed the following
comment from arch/i386/lib/mmx.c's _mmx_memcpy:

"Checksums are not a win with MMX on any CPU tested so far for any MMX
solution figured."

firstly, to what domain of checksums does this comment apply?  secondly,
why is it true?  it seems the PADDW family of instructions could work
well here; is the slowdown a result of the kernel's need to muck with
fpu state (from what i can tell, mmx uses the fp registers)?

thanks so much for any help!

-- 
nick black <dank@reflexsecurity.com>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: checksumming with mmx, comment in arch/i386/lib/mmx.c
  2003-02-11 18:37 checksumming with mmx, comment in arch/i386/lib/mmx.c nick black
@ 2003-02-11 19:06 ` Valdis.Kletnieks
  0 siblings, 0 replies; 2+ messages in thread
From: Valdis.Kletnieks @ 2003-02-11 19:06 UTC (permalink / raw)
  To: nick black; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

On Tue, 11 Feb 2003 13:37:07 EST, dank@suburbanjihad.net (nick black)  said:

> firstly, to what domain of checksums does this comment apply?  secondly,
> why is it true?  it seems the PADDW family of instructions could work
> well here; is the slowdown a result of the kernel's need to muck with
> fpu state (from what i can tell, mmx uses the fp registers)?

(Note - second-hand info from somebody else who looked at MMX/SSE to optimize
an inner loop.  Double-check with CPU documentation).

There's a big "urp" sound as the processor switches from FP to MMX mode and
back, which apparently takes a large number of cycles.  You can to some extent
amortize this if you're switching once for a LONG loop (the analysis I saw was
with a million or so pixels on a screen image) - if you're switching in and out
for a 1500 byte packet (or even worse, a 100-byte packet) the impact may be
more noticable. You may wish to examine the SSE/SSE2 opcodes, which apparently
don't take this performance hit.

-- 
				Valdis Kletnieks
				Computer Systems Senior Engineer
				Virginia Tech


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2003-02-11 18:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-02-11 18:37 checksumming with mmx, comment in arch/i386/lib/mmx.c nick black
2003-02-11 19:06 ` Valdis.Kletnieks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).