linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Ard Biesheuvel' <ardb@kernel.org>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: "herbert@gondor.apana.org.au" <herbert@gondor.apana.org.au>,
	"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"ebiggers@google.com" <ebiggers@google.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: RE: FPU register granularity [Was: Re: [PATCH crypto-stable] crypto: arch/lib - limit simd usage to PAGE_SIZE chunks]
Date: Tue, 21 Apr 2020 08:05:22 +0000	[thread overview]
Message-ID: <c9e0cabe2d6844ac9fc8c00f6bb3bc27@AcuMS.aculab.com> (raw)
In-Reply-To: <CAMj1kXEdWu1v0tGT2Co0oMqWDJS_FNx97qZJmp3GPrrj8MrnrQ@mail.gmail.com>

From: Ard Biesheuvel
> Sent: 21 April 2020 08:02
> On Tue, 21 Apr 2020 at 06:15, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Hi David,
> >
> > On Mon, Apr 20, 2020 at 2:32 AM David Laight <David.Laight@aculab.com> wrote:
> > > Maybe kernel_fp_begin() should be passed the address of somewhere
> > > the address of an fpu save area buffer can be written to.
> > > Then the pre-emption code can allocate the buffer and save the
> > > state into it.
> >
> > Interesting idea. It looks like `struct xregs_state` is only 576
> > bytes. That's not exactly small, but it's not insanely huge either,
> > and maybe we could justifiably stick that on the stack, or even
> > reserve part of the stack allocation for that that the function would
> > know about, without needing to specify any address.
> >
> > > kernel_fpu_begin() ought also be passed a parameter saying which
> > > fpu features are required, and return which are allocated.
> > > On x86 this could be used to check for AVX512 (etc) which may be
> > > available in an ISR unless it interrupted inside a kernel_fpu_begin()
> > > section (etc).
> > > It would also allow optimisations if only 1 or 2 fpu registers are
> > > needed (eg for some of the crypto functions) rather than the whole
> > > fpu register set.
> >
> > For AVX512 this probably makes sense, I suppose. But I'm not sure if
> > there are too many bits of crypto code that only use a few registers.
> > There are those accelerated memcpy routines in i915 though -- ever see
> > drivers/gpu/drm/i915/i915_memcpy.c? sort of wild. But if we did go
> > this way, I wonder if it'd make sense to totally overengineer it and
> > write a gcc/as plugin to create the register mask for us. Or, maybe
> > some checker inside of objtool could help here.
> >
> > Actually, though, the thing I've been wondering about is actually
> > moving in the complete opposite direction: is there some
> > efficient-enough way that we could allow FPU registers in all contexts
> > always, without the need for kernel_fpu_begin/end? I was reversing
> > ntoskrnl.exe and was kind of impressed (maybe not the right word?) by
> > their judicious use of vectorisation everywhere. I assume a lot of
> > that is being generated by their compiler, which of course gcc could
> > do for us if we let it. Is that an interesting avenue to consider? Or
> > are you pretty certain that it'd be a huge mistake, with an
> > irreversible speed hit?
> >
> 
> When I added support for kernel mode SIMD to arm64 originally, there
> was a kernel_neon_begin_partial() that took an int to specify how many
> registers were being used, the reason being that NEON preserve/store
> was fully eager at this point, and arm64 has 32 SIMD registers, many
> of which weren't really used, e.g., in the basic implementation of AES
> based on special instructions.
> 
> With the introduction of lazy restore, and SVE handling for userspace,
> we decided to remove this because it didn't really help anymore, and
> made the code more difficult to manage.
> 
> However, I think it would make sense to have something like this in
> the general case. I.e., NEON registers 0-3 are always preserved when
> an exception or interrupt (or syscall) is taken, and so they can be
> used anywhere in the kernel. If you want the whole set, you will have
> to use begin/end as before. This would already unlock a few
> interesting case, like memcpy, xor, and sequences that can easily be
> implemented with only a few registers such as instructio based AES.
> 
> Unfortunately, the compiler needs to be taught about this to be
> completely useful, which means lots of prototyping and benchmarking
> upfront, as the contract will be set in stone once the compilers get
> on board.

You can always just use asm with explicit registers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  reply	other threads:[~2020-04-21  8:05 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-20  7:57 [PATCH crypto-stable] crypto: arch/lib - limit simd usage to PAGE_SIZE chunks Jason A. Donenfeld
2020-04-20  8:32 ` David Laight
2020-04-21  4:02   ` Jason A. Donenfeld
2020-04-21  4:14   ` FPU register granularity [Was: Re: [PATCH crypto-stable] crypto: arch/lib - limit simd usage to PAGE_SIZE chunks] Jason A. Donenfeld
2020-04-21  4:25     ` Jason A. Donenfeld
2020-04-21  7:02     ` Ard Biesheuvel
2020-04-21  8:05       ` David Laight [this message]
2020-04-21  8:11     ` David Laight
2020-04-22  4:04 ` [PATCH crypto-stable] crypto: arch/lib - limit simd usage to PAGE_SIZE chunks Eric Biggers
2020-04-22  7:23   ` Ard Biesheuvel
2020-04-22  7:38     ` Jason A. Donenfeld
2020-04-22 11:28     ` Sebastian Andrzej Siewior
2020-04-22 19:35       ` Jason A. Donenfeld
2020-04-22  7:32   ` Jason A. Donenfeld
2020-04-22  7:39     ` Ard Biesheuvel
2020-04-22 19:51       ` Jason A. Donenfeld
2020-04-22 20:17         ` Jason A. Donenfeld
2020-04-23  8:45           ` Ard Biesheuvel
2020-04-22 20:03 ` [PATCH crypto-stable v2] crypto: arch - limit simd usage to 4k chunks Jason A. Donenfeld
2020-04-22 22:39   ` Eric Biggers
2020-04-22 23:09     ` Jason A. Donenfeld
2020-04-22 23:18   ` [PATCH crypto-stable v3 1/2] crypto: arch/lib " Jason A. Donenfeld
2020-04-22 23:18     ` [PATCH crypto-stable v3 2/2] crypto: arch/nhpoly1305 - process in explicit " Jason A. Donenfeld
2020-04-23 20:39       ` Eric Biggers
2020-04-23  7:18     ` [PATCH crypto-stable v3 1/2] crypto: arch/lib - limit simd usage to " Ard Biesheuvel
2020-04-23  7:40       ` Christophe Leroy
2020-04-23  7:47         ` Ard Biesheuvel
2020-04-23 18:42       ` Greg KH
2020-04-23 18:47         ` Ard Biesheuvel
2020-04-23 20:23           ` Eric Biggers
2020-04-23 20:49             ` Ard Biesheuvel
2020-04-28 23:09               ` Jason A. Donenfeld
2020-04-30  5:30     ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9e0cabe2d6844ac9fc8c00f6bb3bc27@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=Jason@zx2c4.com \
    --cc=ardb@kernel.org \
    --cc=ebiggers@google.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).