linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Garrett-Glaser <jason@x264.com>
To: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Cc: "Borislav Petkov" <bp@alien8.de>,
	"Johannes Goetzfried"
	<Johannes.Goetzfried@informatik.stud.uni-erlangen.de>,
	linux-crypto@vger.kernel.org,
	"Herbert Xu" <herbert@gondor.hengli.com.au>,
	"Tilo Müller" <tilo.mueller@informatik.uni-erlangen.de>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] crypto: twofish - add x86_64/avx assembler implementation
Date: Wed, 22 Aug 2012 17:05:07 -0700	[thread overview]
Message-ID: <CABS55+BAk0QCXpJsbs6WyN4i5Q2Rt1k2fDhgs=bvbfuaBjhsEA@mail.gmail.com> (raw)
In-Reply-To: <20120822191516.8483.64529.stgit@localhost6.localdomain6>

On Wed, Aug 22, 2012 at 12:20 PM, Jussi Kivilinna
<jussi.kivilinna@mbnet.fi> wrote:
> Quoting Borislav Petkov <bp@alien8.de>:
>
>> On Wed, Aug 22, 2012 at 07:35:12AM +0300, Jussi Kivilinna wrote:
>>> Looks that encryption lost ~0.4% while decryption gained ~1.8%.
>>>
>>> For 256 byte test, it's still slightly slower than twofish-3way
>>> (~3%). For 1k
>>> and 8k tests, it's ~5% faster.
>>>
>>> Here's very last test-patch, testing different ordering of fpu<->cpu reg
>>> instructions at few places.
>>
>> Hehe,.
>>
>> I don't mind testing patches, no worries there. Here are the results
>> this time, doesn't look better than the last run, AFAICT.
>>
>
> Actually it does look better, at least for encryption. Decryption had different
> ordering for test, which appears to be bad on bulldozer as it is on
> sandy-bridge.
>
> So, yet another patch then :)
>
> Interleaving at some new places (reordered lookup_32bit()s in G-macro) and
> doing one of the round rotations one round ahead. Also introduces some
> more paralellism inside lookup_32bit.

Outsider looking in here, but avoiding the 256-way lookup tables
entirely might be faster.  Looking at the twofish code, one byte-wise
calculation looks like this:

    a0 = x >> 4; b0 = x & 15;
    a1 = a0 ^ b0; b1 = ror4[b0] ^ ashx[a0];
    a2 = qt0[n][a1]; b2 = qt1[n][b1];
    a3 = a2 ^ b2; b3 = ror4[b2] ^ ashx[a2];
    a4 = qt2[n][a3]; b4 = qt3[n][b3];
    return (b4 << 4) | a4;

This means that you can do something like this pseudocode (Intel
syntax).  pshufb on ymm registers is AVX2, but splitting it into xmm
operations would probably be fine (as would using this for just a pure
SSE implementation!).  On AVX2 you' have to double the tables for both
ways, naturally.

constants:
pb_0x0f = {0x0f,0x0f,0x0f ... }
ashx: lookup table
ror4: lookup table
qt0[n]: lookup table
qt1[n]: lookup table
qt2[n]: lookup table
qt3[n]: lookup table

vpand    b0,     in, pb_0x0f
vpsrlw   a0,     in, 4
vpand    a0,     a0, pb_0x0f    ; effectively vpsrlb, but that doesn't exist

vpxor    a1,     a0, b0
vpshufb  a0,   ashx, a0
vpshufb  b0,   ror4, b0
vpxor    b1,     a0, b0

vpshufb  a2, qt0[n], a1
vpshufb  b2, qt1[n], b1

vpxor    a3,     a2, b2
vpshufb  a3,   ashx, a2
vpshufb  b3,   ror4, b2
vpxor    b3,     a2, b2

vpshufb  a4, qt2[n], a3
vpshufb  b4, qt3[n], b3

vpsllw   b4,     b4, 4          ; effectively vpsrlb, but that doesn't exist
vpor    out,     a4, b4

That's 15 instructions (plus maybe a move or two) to do 16 lookups for
SSE (~9 cycles by my guessing on a Nehalem).  AVX would run into the
problem of lots of extra vinsert/vextract (just going 16-byte might be
better, might be not, depending on execution units).  AVX2 would be
super fast (15 for 32).

If this works, this could be quite a bit faster with the table-based approach.

Jason

  reply	other threads:[~2012-08-23  0:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-27 14:49 [PATCH] crypto: twofish - add x86_64/avx assembler implementation Johannes Goetzfried
2012-05-28  6:25 ` Jussi Kivilinna
2012-05-28 13:52   ` Johannes Goetzfried
2012-08-15  8:42 ` Jussi Kivilinna
2012-08-15  9:28   ` Borislav Petkov
2012-08-15 11:00     ` Jussi Kivilinna
2012-08-15 12:52       ` Borislav Petkov
2012-08-15 13:48         ` Jussi Kivilinna
2012-08-15 14:03           ` Borislav Petkov
2012-08-15 14:22             ` Jussi Kivilinna
2012-08-15 15:33               ` Borislav Petkov
2012-08-15 17:34             ` Jussi Kivilinna
2012-08-16 13:29               ` Borislav Petkov
2012-08-16 14:26                 ` Jussi Kivilinna
2012-08-17  7:37                 ` Jussi Kivilinna
2012-08-20 17:32                   ` Borislav Petkov
2012-08-22  4:35                     ` Jussi Kivilinna
2012-08-22 13:31                       ` Borislav Petkov
2012-08-22 19:20                         ` Jussi Kivilinna
2012-08-23  0:05                           ` Jason Garrett-Glaser [this message]
2012-08-23  8:33                             ` Jussi Kivilinna
2012-08-23 14:36                           ` Borislav Petkov
2012-08-28  9:17                             ` Jussi Kivilinna
2012-08-28 16:25                               ` Borislav Petkov
2012-05-28 13:54 Johannes Goetzfried
2012-06-12 10:05 ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABS55+BAk0QCXpJsbs6WyN4i5Q2Rt1k2fDhgs=bvbfuaBjhsEA@mail.gmail.com' \
    --to=jason@x264.com \
    --cc=Johannes.Goetzfried@informatik.stud.uni-erlangen.de \
    --cc=bp@alien8.de \
    --cc=herbert@gondor.hengli.com.au \
    --cc=jussi.kivilinna@mbnet.fi \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tilo.mueller@informatik.uni-erlangen.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).