linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: "'Jason A. Donenfeld'" <Jason@zx2c4.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Theodore Tso <tytso@mit.edu>,
	Greg KH <gregkh@linuxfoundation.org>,
	Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Subject: RE: [PATCH v2 2/2] random: use BLAKE2s instead of SHA1 in extraction
Date: Fri, 14 Jan 2022 17:27:43 +0000	[thread overview]
Message-ID: <05ae373684334e6581294baa8afd3238@AcuMS.aculab.com> (raw)
In-Reply-To: <Yd18+iQ8zicsSPa0@zx2c4.com>

From: Jason A. Donenfeld
> Sent: 11 January 2022 12:50
>
> On Tue, Jan 11, 2022 at 1:28 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > If you're really quite concerned about m68k code size, I can probably
> > do some things to reduce that. For example, blake2s256_hmac is only
> > used by wireguard and it could probably be made local there. And with
> > some trivial loop re-rolling, I can shave off another 2300 bytes. And
> > I bet I can find a few other things too. The question is: how
> > important is this to you?
> 
> And with another trick (see below), another extra 1000 bytes or so
> shaved off. Aside from moving blake2s256_hmac, I'm not really super
> enthusiastic about making these changes, but depending on how important
> this is to you, maybe we can make something work. There are probably
> additional possibilities too with the code.

Quite clearly whoever wrote the unrolled loops needs their head examined.
It is extremely unlikely that a cpu has enough registers to implement it
effeciently.
(Of course, a pipelined implementation on a fgpa is another matter.)

So every read of v[] is going to be a memory read.
Much better to do that than to spill values that change.
The memory reads won't really hit performance either.
They add a bit of latency - but that will be handled by
instruction scheduling - either by the compiler of cpu hardware.

> -#define ROUND(r) do { \
> -	G(r, 0, v[0], v[ 4], v[ 8], v[12]); \
> -	G(r, 1, v[1], v[ 5], v[ 9], v[13]); \
> -	G(r, 2, v[2], v[ 6], v[10], v[14]); \
> -	G(r, 3, v[3], v[ 7], v[11], v[15]); \
> -	G(r, 4, v[0], v[ 5], v[10], v[15]); \
> -	G(r, 5, v[1], v[ 6], v[11], v[12]); \
> -	G(r, 6, v[2], v[ 7], v[ 8], v[13]); \
> -	G(r, 7, v[3], v[ 4], v[ 9], v[14]); \
> -} while (0)
> -		ROUND(0);
> -		ROUND(1);
> -		ROUND(2);
> -		ROUND(3);
> -		ROUND(4);
> -		ROUND(5);
> -		ROUND(6);
> -		ROUND(7);
> -		ROUND(8);
> -		ROUND(9);

The v[] values clearly don't change in the above.
Use 4 separate arrays so you have:

#define ROUND(r) do { \
	G(r, 0, v[0], w[0], x[0], y[0]); \
	G(r, 1, v[1], w[1], x[1], y[1]); \
	G(r, 2, v[2], w[2], x[2], y[2]); \
	G(r, 3, v[3], w[3], x[3], y[3]); \
	G(r, 4, v[0], w[1], x[2], y[3]); \
	G(r, 5, v[1], w[2], x[3], y[0]); \
	G(r, 6, v[2], w[3], x[0], y[1]); \
	G(r, 7, v[3], w[0], x[1], y[2]); \

Now double the sizes of v/w/x/y array and write the correct
values when they are created/updated and you get:

#define ROUND(r) do { \
	G(r, 0, v[0], w[0], x[0], y[0]); \
	G(r, 1, v[1], w[1], x[1], y[1]); \
	G(r, 2, v[2], w[2], x[2], y[2]); \
	G(r, 3, v[3], w[3], x[3], y[3]); \
	G(r, 4, v[4], w[4], x[4], y[4]); \
	G(r, 5, v[5], w[5], x[5], y[5]); \
	G(r, 6, v[6], w[6], x[6], y[6]); \
	G(r, 7, v[7], w[7], x[7], y[7]); \

Oh - that is a nice loop...
So we get:
	for (r = 0; r < 10; r++)
		for (j = 0; j < 8; j++)
			G(r, j, v[j], w[j], x[j], y[j]);

Which is likely to be just as fast as any other version.

You might need to give the compiler some great big hints
in order to get sensible code.
Possible make v[] w[] x[] and y[] all volatile and replace
the inner loop body with:
			v_j = v[j]; w_j = x[j]; x_j = x[j]; y_j = z[j];
			G(r, j, v_j, w_j, x_j, y_j);

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  parent reply	other threads:[~2022-01-14 17:27 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-23 14:11 [PATCH v2 1/2] lib/crypto: blake2s: include as built-in Jason A. Donenfeld
2021-12-23 14:11 ` [PATCH v2 2/2] random: use BLAKE2s instead of SHA1 in extraction Jason A. Donenfeld
2021-12-24 20:56   ` Eric Biggers
2022-01-05 21:42     ` Eric Biggers
2021-12-27 15:52   ` Theodore Ts'o
2022-01-11 11:38   ` Geert Uytterhoeven
2022-01-11 12:28     ` Jason A. Donenfeld
2022-01-11 12:50       ` Jason A. Donenfeld
2022-01-11 12:57         ` Geert Uytterhoeven
2022-01-11 13:00           ` Jason A. Donenfeld
2022-01-11 15:46         ` David Laight
2022-01-11 18:26           ` Jason A. Donenfeld
2022-01-14 17:27         ` David Laight [this message]
2022-01-14 17:31           ` Jason A. Donenfeld
2022-01-11 12:51       ` Geert Uytterhoeven
2022-01-11 13:02         ` Jason A. Donenfeld
2022-01-11 13:49           ` [PATCH crypto 0/2] smaller blake2s code size on m68k and other small platforms Jason A. Donenfeld
2022-01-11 13:49             ` [PATCH crypto 1/2] lib/crypto: blake2s-generic: reduce code size on small systems Jason A. Donenfeld
2022-01-12 10:57               ` Geert Uytterhoeven
2022-01-12 13:16                 ` Jason A. Donenfeld
2022-01-12 18:31               ` Eric Biggers
2022-01-12 18:50                 ` Jason A. Donenfeld
2022-01-12 21:27                   ` David Laight
2022-01-12 22:00                     ` Jason A. Donenfeld
2022-01-11 13:49             ` [PATCH crypto 2/2] lib/crypto: blake2s: move hmac construction into wireguard Jason A. Donenfeld
2022-01-11 14:43               ` Ard Biesheuvel
2022-01-12 18:35               ` Eric Biggers
2022-01-11 18:10             ` [PATCH crypto v2 0/2] reduce code size from blake2s on m68k and other small platforms Jason A. Donenfeld
2022-01-11 18:10               ` [PATCH crypto v2 1/2] lib/crypto: blake2s: move hmac construction into wireguard Jason A. Donenfeld
2022-01-11 18:10               ` [PATCH crypto v2 2/2] lib/crypto: sha1: re-roll loops to reduce code size Jason A. Donenfeld
2022-01-11 22:05               ` [PATCH crypto v3 0/2] reduce code size from blake2s on m68k and other small platforms Jason A. Donenfeld
2022-01-11 22:05                 ` [PATCH crypto v3 1/2] lib/crypto: blake2s: move hmac construction into wireguard Jason A. Donenfeld
2022-01-11 22:05                 ` [PATCH crypto v3 2/2] lib/crypto: sha1: re-roll loops to reduce code size Jason A. Donenfeld
2022-01-12 10:59                 ` [PATCH crypto v3 0/2] reduce code size from blake2s on m68k and other small platforms Geert Uytterhoeven
2022-01-12 13:18                   ` Jason A. Donenfeld
2022-01-18  6:42                     ` Herbert Xu
2022-01-18 11:43                       ` Jason A. Donenfeld
2022-01-18 12:44                         ` David Laight
2022-01-18 12:50                           ` Jason A. Donenfeld
2021-12-23 14:20 ` [PATCH v2 1/2] lib/crypto: blake2s: include as built-in Ard Biesheuvel
2021-12-24 13:35 ` Greg KH
2021-12-25  9:26 ` Masahiro Yamada
2021-12-25 10:26   ` Ard Biesheuvel
2021-12-25 15:47     ` Masahiro Yamada
2021-12-27 13:43       ` Jason A. Donenfeld
2021-12-27 13:47         ` [PATCH v3] " Jason A. Donenfeld
2021-12-27 14:20           ` [PATCH v4] " Jason A. Donenfeld
2022-01-01 15:59             ` [PATCH v5] " Jason A. Donenfeld
2022-01-02 20:42               ` [PATCH v6] " Jason A. Donenfeld
2022-01-03  3:23                 ` Herbert Xu
2022-01-03  3:45                   ` Jason A. Donenfeld
2022-01-03  4:06                     ` Herbert Xu
2022-01-03 11:57                       ` Jason A. Donenfeld
2022-01-03 12:31                         ` [PATCH v7] " Jason A. Donenfeld
2022-01-04  1:21                           ` Herbert Xu
2022-01-04 17:02                             ` Ard Biesheuvel
2022-01-04 17:04                               ` Jason A. Donenfeld
2022-01-05  0:28                               ` Herbert Xu
2022-01-05 21:53                               ` Eric Biggers
2022-01-05 22:01                                 ` Ard Biesheuvel
2022-01-05 22:09                                   ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05ae373684334e6581294baa8afd3238@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=Jason@zx2c4.com \
    --cc=geert@linux-m68k.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jeanphilippe.aumasson@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).