linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [rc4-amd64] RC4 optimized for AMD64
@ 2004-11-01  6:57 Marc Bevand
  2004-11-01  7:36 ` James Morris
  2004-11-01 20:44 ` dean gaudet
  0 siblings, 2 replies; 9+ messages in thread
From: Marc Bevand @ 2004-11-01  6:57 UTC (permalink / raw)
  To: linux-kernel

I have just published a small paper about optimizing RC4 for
AMD64 (x86-64). A working implementation is also provided:

  http://epita.fr/~bevand_m/papers/rc4-amd64.html

Kernel people may be interested given the fact that Linux
already implements RC4.

-- 
Marc Bevand                          http://www.epita.fr/~bevand_m
Computer Science School EPITA - System, Network and Security Dept.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [rc4-amd64] RC4 optimized for AMD64
  2004-11-01  6:57 [rc4-amd64] RC4 optimized for AMD64 Marc Bevand
@ 2004-11-01  7:36 ` James Morris
  2004-11-01 12:24   ` Marc Bevand
  2004-11-01 20:44 ` dean gaudet
  1 sibling, 1 reply; 9+ messages in thread
From: James Morris @ 2004-11-01  7:36 UTC (permalink / raw)
  To: Marc Bevand; +Cc: linux-kernel, Andi Kleen, David S. Miller

On Mon, 1 Nov 2004, Marc Bevand wrote:

> I have just published a small paper about optimizing RC4 for
> AMD64 (x86-64). A working implementation is also provided:
> 
>   http://epita.fr/~bevand_m/papers/rc4-amd64.html
> 
> Kernel people may be interested given the fact that Linux
> already implements RC4.

Only problem is that the setkey code is released under a GPL incompatible
license.  Although it's probably not difficult to make the kernel's
existing C setkey code to work with the new asm code.


- James
-- 
James Morris
<jmorris@redhat.com>




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [rc4-amd64] RC4 optimized for AMD64
  2004-11-01  7:36 ` James Morris
@ 2004-11-01 12:24   ` Marc Bevand
  0 siblings, 0 replies; 9+ messages in thread
From: Marc Bevand @ 2004-11-01 12:24 UTC (permalink / raw)
  To: linux-kernel

On 2004-11-01, James Morris <jmorris@redhat.com> wrote:
| 
|  Only problem is that the setkey code is released under a GPL incompatible
|  license.  Although it's probably not difficult to make the kernel's
|  existing C setkey code to work with the new asm code.

Yes, it would be very easy to do. This patch (completetly untested)
is probably all that is necessary to make Linux arc4_set_key() work
with rc4-amd64:

--- 8< -----------------------------------------------------------------
--- crypto/arc4.c.orig  2004-11-01 13:16:41.739375512 +0100
+++ crypto/arc4.c       2004-11-01 13:18:16.799924112 +0100
@@ -20,8 +20,8 @@
 #define ARC4_BLOCK_SIZE                1
 
 struct arc4_ctx {
-       u8 S[256];
-       u8 x, y;
+       u64 x, y;
+       u64 S[256];
 };
 
 static int arc4_set_key(void *ctx_arg, const u8 *in_key, unsigned int key_len, u32 *flags)
--- 8< -----------------------------------------------------------------

-- 
Marc Bevand                          http://www.epita.fr/~bevand_m
Computer Science School EPITA - System, Network and Security Dept.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [rc4-amd64] RC4 optimized for AMD64
  2004-11-01  6:57 [rc4-amd64] RC4 optimized for AMD64 Marc Bevand
  2004-11-01  7:36 ` James Morris
@ 2004-11-01 20:44 ` dean gaudet
  2004-11-01 20:51   ` dean gaudet
       [not found]   ` <200411020854.21629.vda@port.imtp.ilyichevsk.odessa.ua>
  1 sibling, 2 replies; 9+ messages in thread
From: dean gaudet @ 2004-11-01 20:44 UTC (permalink / raw)
  To: Marc Bevand; +Cc: linux-kernel

On Mon, 1 Nov 2004, Marc Bevand wrote:

> I have just published a small paper about optimizing RC4 for
> AMD64 (x86-64). A working implementation is also provided:
> 
>   http://epita.fr/~bevand_m/papers/rc4-amd64.html
> 
> Kernel people may be interested given the fact that Linux
> already implements RC4.

you've made a non-portable flags assumption:

>       dec     %r11b
>       ror     $8,             %r8             # (ror does not change ZF)
>       jnz 1b

the contents of ZF are undefined after a rotation... most importantly they 
differ between p4 (ZF is set according to result) and k8 (ZF unchanged).

do you really measure a perf improvement from this assumption?  note that 
p4 would prefer "sub $1, %r11b" here instead of dec... but the difference 
is likely minimal.

-dean

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [rc4-amd64] RC4 optimized for AMD64
  2004-11-01 20:44 ` dean gaudet
@ 2004-11-01 20:51   ` dean gaudet
  2004-11-01 23:01     ` Marc Bevand
       [not found]   ` <200411020854.21629.vda@port.imtp.ilyichevsk.odessa.ua>
  1 sibling, 1 reply; 9+ messages in thread
From: dean gaudet @ 2004-11-01 20:51 UTC (permalink / raw)
  To: Marc Bevand; +Cc: linux-kernel



On Mon, 1 Nov 2004, dean gaudet wrote:

> On Mon, 1 Nov 2004, Marc Bevand wrote:
> 
> > I have just published a small paper about optimizing RC4 for
> > AMD64 (x86-64). A working implementation is also provided:
> > 
> >   http://epita.fr/~bevand_m/papers/rc4-amd64.html
> > 
> > Kernel people may be interested given the fact that Linux
> > already implements RC4.
> 
> you've made a non-portable flags assumption:
> 
> >       dec     %r11b
> >       ror     $8,             %r8             # (ror does not change ZF)
> >       jnz 1b
> 
> the contents of ZF are undefined after a rotation... most importantly 
> they differ between p4 (ZF is set according to result) and k8 (ZF 
> unchanged).

ack... it's too early on a monday morning -- i misread the documentation.  
this ZF assumption is actually defined and portable... still kind of ugly.  
how much benefit do you see?

-dean

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [rc4-amd64] RC4 optimized for AMD64
  2004-11-01 20:51   ` dean gaudet
@ 2004-11-01 23:01     ` Marc Bevand
  0 siblings, 0 replies; 9+ messages in thread
From: Marc Bevand @ 2004-11-01 23:01 UTC (permalink / raw)
  To: dean gaudet; +Cc: linux-kernel

dean gaudet wrote:
| 
| [...]
| ack... it's too early on a monday morning -- i misread the documentation.
| this ZF assumption is actually defined and portable... still kind of ugly.
| how much benefit do you see?

When "dec" is placed before "ror", throughput goes up by about 5%
on my test system (Opteron 244 rev C0). I don't find it "ugly"
because the optimization no intrusive at all (only 1 moved instruction).

Concerning the "dec / sub $1" case, it makes absolutely no difference
on the Opteron, I just used "dec" because the opcode is 3 bytes length
instead of 4.

-- 
Marc Bevand                          http://www.epita.fr/~bevand_m
Computer Science School EPITA - System, Network and Security Dept.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [rc4-amd64] RC4 optimized for AMD64
       [not found]   ` <200411020854.21629.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-11-02 18:52     ` dean gaudet
  2004-11-02 21:45       ` Denis Vlasenko
  2004-11-03 10:07       ` Marc Bevand
  0 siblings, 2 replies; 9+ messages in thread
From: dean gaudet @ 2004-11-02 18:52 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Marc Bevand, linux-kernel

On Tue, 2 Nov 2004, Denis Vlasenko wrote:

> On Monday 01 November 2004 22:44, dean gaudet wrote:
> > note that 
> > p4 would prefer "sub $1, %r11b" here instead of dec... but the difference 
> > is likely minimal.
> 
> p4 is not the only x86 CPU on the planet. Why should I use
> longer instruction then?

you're asking about spending one byte?  one byte extra for code which 
could perform better on more CPUs?

i could equally well say "k8 is not the only x86-64 cpu on the planet".

i really don't care whether this change is made or not, i only mentioned a 
general perf rule.

go ahead and use -Os for the rest of the kernel if you're worried about 
size, it'll likely go faster.  but spending 1 byte in code which is perf 
critical is nothing.

-dean

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [rc4-amd64] RC4 optimized for AMD64
  2004-11-02 18:52     ` dean gaudet
@ 2004-11-02 21:45       ` Denis Vlasenko
  2004-11-03 10:07       ` Marc Bevand
  1 sibling, 0 replies; 9+ messages in thread
From: Denis Vlasenko @ 2004-11-02 21:45 UTC (permalink / raw)
  To: dean gaudet; +Cc: Marc Bevand, linux-kernel

On Tuesday 02 November 2004 20:52, dean gaudet wrote:
> On Tue, 2 Nov 2004, Denis Vlasenko wrote:
> 
> > On Monday 01 November 2004 22:44, dean gaudet wrote:
> > > note that 
> > > p4 would prefer "sub $1, %r11b" here instead of dec... but the difference 
> > > is likely minimal.
> > 
> > p4 is not the only x86 CPU on the planet. Why should I use
> > longer instruction then?
> 
> you're asking about spending one byte?  one byte extra for code which 
> could perform better on more CPUs?

You're asking about speedup by 1 cycle on a CPU which will be outdated
6 months from now?
--
vda


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [rc4-amd64] RC4 optimized for AMD64
  2004-11-02 18:52     ` dean gaudet
  2004-11-02 21:45       ` Denis Vlasenko
@ 2004-11-03 10:07       ` Marc Bevand
  1 sibling, 0 replies; 9+ messages in thread
From: Marc Bevand @ 2004-11-03 10:07 UTC (permalink / raw)
  To: dean gaudet; +Cc: Denis Vlasenko, linux-kernel

dean gaudet wrote:
| 
| [...]
| you're asking about spending one byte?  one byte extra for code which 
| could perform better on more CPUs?

Guys, this does not matter _for now_, because AFAIK nobody has
benchmarked this code on an EM64T P4 CPU.

Obviously, if 'sub $1,X' is proved to be faster than 'dec' on the
Intel CPU, then I will change the code (since both instructions are
equivalent on AMD CPUs).

-- 
Marc Bevand                          http://www.epita.fr/~bevand_m
Computer Science School EPITA - System, Network and Security Dept.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-11-03 10:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-01  6:57 [rc4-amd64] RC4 optimized for AMD64 Marc Bevand
2004-11-01  7:36 ` James Morris
2004-11-01 12:24   ` Marc Bevand
2004-11-01 20:44 ` dean gaudet
2004-11-01 20:51   ` dean gaudet
2004-11-01 23:01     ` Marc Bevand
     [not found]   ` <200411020854.21629.vda@port.imtp.ilyichevsk.odessa.ua>
2004-11-02 18:52     ` dean gaudet
2004-11-02 21:45       ` Denis Vlasenko
2004-11-03 10:07       ` Marc Bevand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).