[PATCH v2 0/9] arm64: rework NEON yielding to avoid scheduling from asm code

* [PATCH v2 0/9] arm64: rework NEON yielding to avoid scheduling from asm code
@ 2021-02-03 11:36 Ard Biesheuvel
  2021-02-03 11:36 ` [PATCH v2 1/9] arm64: assembler: add cond_yield macro Ard Biesheuvel
                   ` (10 more replies)
  0 siblings, 11 replies; 18+ messages in thread
From: Ard Biesheuvel @ 2021-02-03 11:36 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-arm-kernel, will, mark.rutland, catalin.marinas, herbert,
	Ard Biesheuvel, Dave Martin, Eric Biggers

Given how kernel mode NEON code disables preemption (to ensure that the
FP/SIMD register state is protected without having to context switch it),
we need to take care not to let those algorithms operate on unbounded
input data, or we may end up with excessive scheduling blackouts on
CONFIG_PREEMPT kernels.

This is currently handled by the cond_yield_neon macros, which check the
preempt count and the TIF_NEED_RESCHED flag from assembler code, and call
into kernel_neon_end()+kernel_neon_begin(), triggering a reschedule.
This works as expected, but is a bit messy, given how much of the state
preserve/restore code in the algorithm needs to be duplicated, as well as
causing the need to manage the stack frame explicitly. All of this is better
handled by the compiler, especially now that we have enabled features such
as the shadow call stack and BTI, and are working to improve call stack
validation.

In some cases, yielding is not necessary at all: algoritms that implement
skciphers and use the skcipher walk API will be invoked at page granularity,
which is granular enough for our purpose.

In other cases, it is better to simply return early from the assembler
routine if a reschedule is pending, and let the C code handle with this, by
retrying the call until it completes. This removes any voluntary schedule()
calls from the call stack, making the code much easier to reason about in
the context of stack validation, rcu_tasks synchronization, etc.

Practical note: assuming there are no objections to these changes, it may
be the most convenient to take patch #1 into the arm64 tree for v5.12,
and postpone the rest for merging via the crypto tree. (Note that this
series was created against the cryptodev tree, and so the arm64 maintainers
are also welcome to take the whole set if it applies cleanly to the arm64
tree)

Will: if you stick #1 on a separate branch, please base it on v5.11-rc1

Changes since v1:
- use sub+cbz instead of cmp+b.eq to avoid clobbering the flags in cond_yield
  (patch #1)

Cc: Dave Martin <dave.martin@arm.com>
Cc: Eric Biggers <ebiggers@google.com>

Ard Biesheuvel (9):
  arm64: assembler: add cond_yield macro
  crypto: arm64/sha1-ce - simplify NEON yield
  crypto: arm64/sha2-ce - simplify NEON yield
  crypto: arm64/sha3-ce - simplify NEON yield
  crypto: arm64/sha512-ce - simplify NEON yield
  crypto: arm64/aes-neonbs - remove NEON yield calls
  crypto: arm64/aes-ce-mac - simplify NEON yield
  crypto: arm64/crc-t10dif - move NEON yield to C code
  arm64: assembler: remove conditional NEON yield macros

 arch/arm64/crypto/aes-glue.c          | 21 +++--
 arch/arm64/crypto/aes-modes.S         | 52 +++++--------
 arch/arm64/crypto/aes-neonbs-core.S   |  8 +-
 arch/arm64/crypto/crct10dif-ce-core.S | 43 +++--------
 arch/arm64/crypto/crct10dif-ce-glue.c | 30 ++++++--
 arch/arm64/crypto/sha1-ce-core.S      | 47 ++++--------
 arch/arm64/crypto/sha1-ce-glue.c      | 22 +++---
 arch/arm64/crypto/sha2-ce-core.S      | 38 ++++-----
 arch/arm64/crypto/sha2-ce-glue.c      | 22 +++---
 arch/arm64/crypto/sha3-ce-core.S      | 81 ++++++++------------
 arch/arm64/crypto/sha3-ce-glue.c      | 14 ++--
 arch/arm64/crypto/sha512-ce-core.S    | 29 ++-----
 arch/arm64/crypto/sha512-ce-glue.c    | 53 +++++++------
 arch/arm64/include/asm/assembler.h    | 78 +++----------------
 14 files changed, 209 insertions(+), 329 deletions(-)

-- 
2.30.0

^ permalink raw reply	[flat|nested] 18+ messages in thread