linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: "'Jason A. Donenfeld'" <Jason@zx2c4.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Theodore Tso <tytso@mit.edu>,
	Greg KH <gregkh@linuxfoundation.org>,
	Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Subject: RE: [PATCH v2 2/2] random: use BLAKE2s instead of SHA1 in extraction
Date: Tue, 11 Jan 2022 15:46:56 +0000	[thread overview]
Message-ID: <caed82818cdb466aade033501f57d183@AcuMS.aculab.com> (raw)
In-Reply-To: <Yd18+iQ8zicsSPa0@zx2c4.com>

From: Jason A. Donenfeld
> Sent: 11 January 2022 12:50
> 
> On Tue, Jan 11, 2022 at 1:28 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > If you're really quite concerned about m68k code size, I can probably
> > do some things to reduce that. For example, blake2s256_hmac is only
> > used by wireguard and it could probably be made local there. And with
> > some trivial loop re-rolling, I can shave off another 2300 bytes. And
> > I bet I can find a few other things too. The question is: how
> > important is this to you?
> 
> And with another trick (see below), another extra 1000 bytes or so
> shaved off. Aside from moving blake2s256_hmac, I'm not really super
> enthusiastic about making these changes, but depending on how important
> this is to you, maybe we can make something work. There are probably
> additional possibilities too with the code.
> 
> ====
> 
> diff --git a/lib/crypto/blake2s-generic.c b/lib/crypto/blake2s-generic.c
> index 75ccb3e633e6..8e3c6372363a 100644
> --- a/lib/crypto/blake2s-generic.c
> +++ b/lib/crypto/blake2s-generic.c
> @@ -46,7 +46,7 @@ void blake2s_compress_generic(struct blake2s_state *state, const u8 *block,
>  {
>  	u32 m[16];
>  	u32 v[16];
> -	int i;
> +	int i, j;

Use unsigned int i, j;
Ensures the '% 4' are done as '& 3' and the divides as shifts.
Unless the compiler manages to track the valid values that will
even generate better code on x86-64.
(Saves a sign extension prior to the array indexes.)

>  	WARN_ON(IS_ENABLED(DEBUG) &&
>  		(nblocks > 1 && inc != BLAKE2S_BLOCK_SIZE));
> @@ -76,29 +76,11 @@ void blake2s_compress_generic(struct blake2s_state *state, const u8 *block,
>  	b = ror32(b ^ c, 7); \
>  } while (0)
> 
> -#define ROUND(r) do { \
> -	G(r, 0, v[0], v[ 4], v[ 8], v[12]); \
> -	G(r, 1, v[1], v[ 5], v[ 9], v[13]); \
> -	G(r, 2, v[2], v[ 6], v[10], v[14]); \
> -	G(r, 3, v[3], v[ 7], v[11], v[15]); \
> -	G(r, 4, v[0], v[ 5], v[10], v[15]); \
> -	G(r, 5, v[1], v[ 6], v[11], v[12]); \
> -	G(r, 6, v[2], v[ 7], v[ 8], v[13]); \
> -	G(r, 7, v[3], v[ 4], v[ 9], v[14]); \
> -} while (0)
> -		ROUND(0);
> -		ROUND(1);
> -		ROUND(2);
> -		ROUND(3);
> -		ROUND(4);
> -		ROUND(5);
> -		ROUND(6);
> -		ROUND(7);
> -		ROUND(8);
> -		ROUND(9);
> -
> +		for (i = 0; i < 10; ++i) {
> +			for (j = 0; j < 8; ++j)
> +				G(i, j, v[j % 4], v[((j + (j / 4)) % 4) + 4], v[((j + 2 * (j / 4)) % 4) + 8],
> v[((j + 3 * (j / 4)) % 4) + 12]);

I think I'd look at doing [0..3] then [4..7] to save execution time.

I actually wonder how large a block you need to be processing to get
a gain from all that unrolling on architectures like x86-64.
The 'cold cache' timing must be horrid.
Never mind the side effects of displacing that much other code from the I-cache.

There aren't enough registers to hold all the v[] values so they'll
all be memory reads - and there are probably others as well.
The other instructions will happen in parallel - even 3 or 4 for each memory read.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  parent reply	other threads:[~2022-01-11 15:47 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-23 14:11 [PATCH v2 1/2] lib/crypto: blake2s: include as built-in Jason A. Donenfeld
2021-12-23 14:11 ` [PATCH v2 2/2] random: use BLAKE2s instead of SHA1 in extraction Jason A. Donenfeld
2021-12-24 20:56   ` Eric Biggers
2022-01-05 21:42     ` Eric Biggers
2021-12-27 15:52   ` Theodore Ts'o
2022-01-11 11:38   ` Geert Uytterhoeven
2022-01-11 12:28     ` Jason A. Donenfeld
2022-01-11 12:50       ` Jason A. Donenfeld
2022-01-11 12:57         ` Geert Uytterhoeven
2022-01-11 13:00           ` Jason A. Donenfeld
2022-01-11 15:46         ` David Laight [this message]
2022-01-11 18:26           ` Jason A. Donenfeld
2022-01-14 17:27         ` David Laight
2022-01-14 17:31           ` Jason A. Donenfeld
2022-01-11 12:51       ` Geert Uytterhoeven
2022-01-11 13:02         ` Jason A. Donenfeld
2022-01-11 13:49           ` [PATCH crypto 0/2] smaller blake2s code size on m68k and other small platforms Jason A. Donenfeld
2022-01-11 13:49             ` [PATCH crypto 1/2] lib/crypto: blake2s-generic: reduce code size on small systems Jason A. Donenfeld
2022-01-12 10:57               ` Geert Uytterhoeven
2022-01-12 13:16                 ` Jason A. Donenfeld
2022-01-12 18:31               ` Eric Biggers
2022-01-12 18:50                 ` Jason A. Donenfeld
2022-01-12 21:27                   ` David Laight
2022-01-12 22:00                     ` Jason A. Donenfeld
2022-01-11 13:49             ` [PATCH crypto 2/2] lib/crypto: blake2s: move hmac construction into wireguard Jason A. Donenfeld
2022-01-11 14:43               ` Ard Biesheuvel
2022-01-12 18:35               ` Eric Biggers
2022-01-11 18:10             ` [PATCH crypto v2 0/2] reduce code size from blake2s on m68k and other small platforms Jason A. Donenfeld
2022-01-11 18:10               ` [PATCH crypto v2 1/2] lib/crypto: blake2s: move hmac construction into wireguard Jason A. Donenfeld
2022-01-11 18:10               ` [PATCH crypto v2 2/2] lib/crypto: sha1: re-roll loops to reduce code size Jason A. Donenfeld
2022-01-11 22:05               ` [PATCH crypto v3 0/2] reduce code size from blake2s on m68k and other small platforms Jason A. Donenfeld
2022-01-11 22:05                 ` [PATCH crypto v3 1/2] lib/crypto: blake2s: move hmac construction into wireguard Jason A. Donenfeld
2022-01-11 22:05                 ` [PATCH crypto v3 2/2] lib/crypto: sha1: re-roll loops to reduce code size Jason A. Donenfeld
2022-01-12 10:59                 ` [PATCH crypto v3 0/2] reduce code size from blake2s on m68k and other small platforms Geert Uytterhoeven
2022-01-12 13:18                   ` Jason A. Donenfeld
2022-01-18  6:42                     ` Herbert Xu
2022-01-18 11:43                       ` Jason A. Donenfeld
2022-01-18 12:44                         ` David Laight
2022-01-18 12:50                           ` Jason A. Donenfeld
2021-12-23 14:20 ` [PATCH v2 1/2] lib/crypto: blake2s: include as built-in Ard Biesheuvel
2021-12-24 13:35 ` Greg KH
2021-12-25  9:26 ` Masahiro Yamada
2021-12-25 10:26   ` Ard Biesheuvel
2021-12-25 15:47     ` Masahiro Yamada
2021-12-27 13:43       ` Jason A. Donenfeld
2021-12-27 13:47         ` [PATCH v3] " Jason A. Donenfeld
2021-12-27 14:20           ` [PATCH v4] " Jason A. Donenfeld
2022-01-01 15:59             ` [PATCH v5] " Jason A. Donenfeld
2022-01-02 20:42               ` [PATCH v6] " Jason A. Donenfeld
2022-01-03  3:23                 ` Herbert Xu
2022-01-03  3:45                   ` Jason A. Donenfeld
2022-01-03  4:06                     ` Herbert Xu
2022-01-03 11:57                       ` Jason A. Donenfeld
2022-01-03 12:31                         ` [PATCH v7] " Jason A. Donenfeld
2022-01-04  1:21                           ` Herbert Xu
2022-01-04 17:02                             ` Ard Biesheuvel
2022-01-04 17:04                               ` Jason A. Donenfeld
2022-01-05  0:28                               ` Herbert Xu
2022-01-05 21:53                               ` Eric Biggers
2022-01-05 22:01                                 ` Ard Biesheuvel
2022-01-05 22:09                                   ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=caed82818cdb466aade033501f57d183@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=Jason@zx2c4.com \
    --cc=geert@linux-m68k.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jeanphilippe.aumasson@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).