All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Polyakov <appro@cryptogams.org>
To: "René van Dorst" <opensource@vdorst.com>,
	"Ard Biesheuvel" <ard.biesheuvel@linaro.org>
Cc: linux-crypto@vger.kernel.org,
	Herbert Xu <herbert@gondor.apana.org.au>,
	David Miller <davem@davemloft.net>,
	"Jason A . Donenfeld" <Jason@zx2c4.com>,
	Samuel Neves <sneves@dei.uc.pt>, Arnd Bergmann <arnd@arndb.de>,
	Eric Biggers <ebiggers@google.com>,
	Andy Lutomirski <luto@kernel.org>,
	Martin Willi <martin@strongswan.org>
Subject: Re: [PATCH v3 19/29] crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation
Date: Tue, 8 Oct 2019 13:38:39 +0200	[thread overview]
Message-ID: <a1c1ade1-f62a-3422-c161-a1d62ea67203@cryptogams.org> (raw)
In-Reply-To: <20191007210242.Horde.FiSEhRSAuhKHgFx9ROLFIco@www.vdorst.com>

Hi,

On 10/7/19 11:02 PM, René van Dorst wrote:
> Quoting Ard Biesheuvel <ard.biesheuvel@linaro.org>:
> 
>> This is a straight import of the OpenSSL/CRYPTOGAMS Poly1305
>> implementation
>> for MIPS authored by Andy Polyakov, and contributed by him to the OpenSSL
>> project.

Formally speaking this is a little bit misleading statement. Cryptogams
poly1305-mips module implements both 64- and 32-bit code paths, while
what you'll find in OpenSSL is 64-only implementation. But in either case...

>> <snip>
> 
> Hi Ard,
> 
> Is it also an option to include my mip32r2 optimized poly1305 version?
> 
> Below the results which shows a good improvement over the Andy Polyakov
> version.
> I swapped the poly1305 assembly file and rename the function to
> <func_name>_mips
> Full WireGuard source with the changes [0]
> 
> bytes |  RvD | openssl | delta | delta / openssl
>  ...
>  4096 | 9160 | 11755   | -2595 | -22,08%

I assume that the presented results depict regression after switch to
cryptogams module. Right? RvD implementation distinguishes itself in two
ways:

1. some of additions in inner loop are replaced with multiply-by-1-n-add;
2. carry chain at the end of the inner loop is effectively fused with
beginning of the said loop/taken out of the loop.

I recall attempting 1. and chosen not to do it with following rationale.
On processor I have access to, Octeon II, it made no significant
difference. It was better, but only marginally. And it's understandable,
because Octeon II should have lesser difficulty pairing those additions
with multiply-n-add instructions. But since multiplication is an
expensive operation, it can be pretty slow, I reckoned that on processor
less potent than Octeon II it might be more appropriate to minimize
amount of multiplication-n-add instructions. In other words idea is not
(and never has been) to get fixated on specific processor at hand, but
try to find a sensible compromise that would produce reasonable
performance on a range of processors. Of course problem is that it's
just an assumption I made here, and it could turn wrong in practice:-)
So I wonder which processor do you run on, René? For reference I measure
>70MB/sec for 1KB blocks for chacha20poly1305 on 1GHz Octeon II. You
report ~34MB/sec, so it ought to be something different. Given second
data point it might be appropriate to reconsider and settle for
multiply-by-1-n-add.

As for 2. I haven't considered it. Since it's a back-to-back dependency
chain, if fused with top of the loop, it actually has more promising
potential than 1. And it would improve all results, not only MISP32R2.
Would you trust me with adopting it to my module? Naturally with due credit.

Cheers.


  parent reply	other threads:[~2019-10-08 11:38 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-07 16:45 [PATCH v3 00/29] crypto: crypto API library interfaces for WireGuard Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 01/29] crypto: chacha - move existing library code into lib/crypto Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 02/29] crypto: x86/chacha - depend on generic chacha library instead of crypto driver Ard Biesheuvel
2019-10-11  6:00   ` Eric Biggers
2019-10-15 10:00   ` Martin Willi
2019-10-15 10:12     ` Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 03/29] crypto: x86/chacha - expose SIMD ChaCha routine as library function Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 04/29] crypto: arm64/chacha - depend on generic chacha library instead of crypto driver Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 05/29] crypto: arm64/chacha - expose arm64 ChaCha routine as library function Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 06/29] crypto: arm/chacha - import Eric Biggers's scalar accelerated ChaCha code Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 07/29] crypto: arm/chacha - remove dependency on generic ChaCha driver Ard Biesheuvel
2019-10-11  6:12   ` Eric Biggers
2019-10-11  6:31   ` Eric Biggers
2019-10-07 16:45 ` [PATCH v3 08/29] crypto: arm/chacha - expose ARM ChaCha routine as library function Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 09/29] crypto: mips/chacha - import 32r2 ChaCha code from Zinc Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 10/29] crypto: mips/chacha - wire up accelerated 32r2 " Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 11/29] crypto: chacha - unexport chacha_generic routines Ard Biesheuvel
2019-10-11  6:04   ` Eric Biggers
2019-10-07 16:45 ` [PATCH v3 12/29] crypto: poly1305 - move core routines into a separate library Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 13/29] crypto: x86/poly1305 - unify Poly1305 state struct with generic code Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 14/29] crypto: poly1305 - expose init/update/final library interface Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 15/29] crypto: x86/poly1305 - depend on generic library not generic shash Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 16/29] crypto: x86/poly1305 - expose existing driver as poly1305 library Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 17/29] crypto: arm64/poly1305 - incorporate OpenSSL/CRYPTOGAMS NEON implementation Ard Biesheuvel
2019-10-07 16:45 ` [PATCH v3 18/29] crypto: arm/poly1305 " Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 19/29] crypto: mips/poly1305 - incorporate OpenSSL/CRYPTOGAMS optimized implementation Ard Biesheuvel
2019-10-07 21:02   ` René van Dorst
2019-10-08  5:55     ` Ard Biesheuvel
2019-10-08 11:38     ` Andy Polyakov [this message]
2019-10-08 17:46       ` René van Dorst
2019-10-11 14:14       ` Andy Polyakov
2019-10-11 17:21         ` René van Dorst
2019-10-11 18:49           ` Andy Polyakov
2019-10-11 21:38           ` Arnd Bergmann
2019-10-07 16:46 ` [PATCH v3 20/29] int128: move __uint128_t compiler test to Kconfig Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 21/29] crypto: BLAKE2s - generic C library implementation and selftest Ard Biesheuvel
2019-10-11  6:02   ` Eric Biggers
2019-10-11 16:45     ` Jason A. Donenfeld
2019-10-14 12:53       ` Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 22/29] crypto: BLAKE2s - x86_64 library implementation Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 23/29] crypto: Curve25519 - generic C library implementations and selftest Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 24/29] crypto: lib/curve25519 - work around Clang stack spilling issue Ard Biesheuvel
2019-10-14 14:13   ` Jason A. Donenfeld
2019-10-14 16:07     ` Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 25/29] crypto: Curve25519 - x86_64 library implementation Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 26/29] crypto: arm - import Bernstein and Schwabe's Curve25519 ARM implementation Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 27/29] crypto: arm/Curve25519 - wire up NEON implementation Ard Biesheuvel
2019-10-07 16:46 ` [PATCH v3 28/29] crypto: chacha20poly1305 - import construction and selftest from Zinc Ard Biesheuvel
2019-10-11  6:14   ` Eric Biggers
2019-10-07 16:46 ` [PATCH v3 29/29] crypto: lib/chacha20poly1305 - reimplement crypt_from_sg() routine Ard Biesheuvel
2019-10-14 14:33 ` [PATCH v3 00/29] crypto: crypto API library interfaces for WireGuard Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a1c1ade1-f62a-3422-c161-a1d62ea67203@cryptogams.org \
    --to=appro@cryptogams.org \
    --cc=Jason@zx2c4.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=arnd@arndb.de \
    --cc=davem@davemloft.net \
    --cc=ebiggers@google.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=martin@strongswan.org \
    --cc=opensource@vdorst.com \
    --cc=sneves@dei.uc.pt \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.