From mboxrd@z Thu Jan 1 00:00:00 1970 From: Vakul Garg Subject: RE: [PATCH v5 00/23] crypto: arm64 - play nice with CONFIG_PREEMPT Date: Sun, 11 Mar 2018 05:16:59 +0000 Message-ID: References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: Mark Rutland , "herbert@gondor.apana.org.au" , Peter Zijlstra , Catalin Marinas , Sebastian Andrzej Siewior , Will Deacon , Russell King - ARM Linux , Steven Rostedt , Thomas Gleixner , Dave Martin , "linux-arm-kernel@lists.infradead.org" , "linux-rt-users@vger.kernel.org" To: Ard Biesheuvel , "linux-crypto@vger.kernel.org" Return-path: In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=m.gmane.org@lists.infradead.org List-Id: linux-crypto.vger.kernel.org Hi How does this patchset affect the throughput performance of crypto? Is it expected to increase? Regards Vakul > -----Original Message----- > From: linux-crypto-owner@vger.kernel.org [mailto:linux-crypto- > owner@vger.kernel.org] On Behalf Of Ard Biesheuvel > Sent: Saturday, March 10, 2018 8:52 PM > To: linux-crypto@vger.kernel.org > Cc: herbert@gondor.apana.org.au; linux-arm-kernel@lists.infradead.org; > Ard Biesheuvel ; Dave Martin > ; Russell King - ARM Linux > ; Sebastian Andrzej Siewior > ; Mark Rutland ; linux-rt- > users@vger.kernel.org; Peter Zijlstra ; Catalin > Marinas ; Will Deacon > ; Steven Rostedt ; Thomas > Gleixner > Subject: [PATCH v5 00/23] crypto: arm64 - play nice with CONFIG_PREEMPT > > As reported by Sebastian, the way the arm64 NEON crypto code currently > keeps kernel mode NEON enabled across calls into skcipher_walk_xxx() is > causing problems with RT builds, given that the skcipher walk API may > allocate and free temporary buffers it uses to present the input and output > arrays to the crypto algorithm in blocksize sized chunks (where blocksize is > the natural blocksize of the crypto algorithm), and doing so with NEON > enabled means we're alloc/free'ing memory with preemption disabled. > > This was deliberate: when this code was introduced, each > kernel_neon_begin() and kernel_neon_end() call incurred a fixed penalty of > storing resp. > loading the contents of all NEON registers to/from memory, and so doing it > less often had an obvious performance benefit. However, in the mean time, > we have refactored the core kernel mode NEON code, and now > kernel_neon_begin() only incurs this penalty the first time it is called after > entering the kernel, and the NEON register restore is deferred until returning > to userland. This means pulling those calls into the loops that iterate over the > input/output of the crypto algorithm is not a big deal anymore (although > there are some places in the code where we relied on the NEON registers > retaining their values between calls) > > So let's clean this up for arm64: update the NEON based skcipher drivers to > no longer keep the NEON enabled when calling into the skcipher walk API. > > As pointed out by Peter, this only solves part of the problem. So let's tackle it > more thoroughly, and update the algorithms to test the NEED_RESCHED flag > each time after processing a fixed chunk of input. > > Given that this issue was flagged by the RT people, I would appreciate it if > they could confirm whether they are happy with this approach. > > Changes since v4: > - rebase onto v4.16-rc3 > - apply the same treatment to new SHA512, SHA-3 and SM3 code that landed > in v4.16-rc1 > > Changes since v3: > - incorporate Dave's feedback on the asm macros to push/pop frames and to > yield > the NEON conditionally > - make frame_push/pop more easy to use, by recording the arguments to > frame_push, removing the need to specify them again when calling > frame_pop > - emit local symbol .Lframe_local_offset to allow code using the frame > push/pop > macros to index the stack more easily > - use the magic \@ macro invocation counter provided by GAS to generate > unique > labels om the NEON yield macros, rather than relying on chance > > Changes since v2: > - Drop logic to yield only after so many blocks - as it turns out, the > throughput of the algorithms that are most likely to be affected by the > overhead (GHASH and AES-CE) only drops by ~1% (on Cortex-A57), and if > that > is inacceptable, you are probably not using CONFIG_PREEMPT in the first > place. > - Add yield support to the AES-CCM driver > - Clean up macros based on feedback from Dave > - Given that I had to add stack frame logic to many of these functions, factor > it out and wrap it in a couple of macros > - Merge the changes to the core asm driver and glue code of the > GHASH/GCM > driver. The latter was not correct without the former. > > Changes since v1: > - add CRC-T10DIF test vector (#1) > - stop using GFP_ATOMIC in scatterwalk API calls, now that they are > executed > with preemption enabled (#2 - #6) > - do some preparatory refactoring on the AES block mode code (#7 - #9) > - add yield patches (#10 - #18) > - add test patch (#19) - DO NOT MERGE > > Cc: Dave Martin > Cc: Russell King - ARM Linux > Cc: Sebastian Andrzej Siewior > Cc: Mark Rutland > Cc: linux-rt-users@vger.kernel.org > Cc: Peter Zijlstra > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Steven Rostedt > Cc: Thomas Gleixner > > Ard Biesheuvel (23): > crypto: testmgr - add a new test case for CRC-T10DIF > crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop > crypto: arm64/aes-blk - move kernel mode neon en/disable into loop > crypto: arm64/aes-bs - move kernel mode neon en/disable into loop > crypto: arm64/chacha20 - move kernel mode neon en/disable into loop > crypto: arm64/aes-blk - remove configurable interleave > crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path > crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path > crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels > arm64: assembler: add utility macros to push/pop stack frames > arm64: assembler: add macros to conditionally yield the NEON under > PREEMPT > crypto: arm64/sha1-ce - yield NEON after every block of input > crypto: arm64/sha2-ce - yield NEON after every block of input > crypto: arm64/aes-ccm - yield NEON after every block of input > crypto: arm64/aes-blk - yield NEON after every block of input > crypto: arm64/aes-bs - yield NEON after every block of input > crypto: arm64/aes-ghash - yield NEON after every block of input > crypto: arm64/crc32-ce - yield NEON after every block of input > crypto: arm64/crct10dif-ce - yield NEON after every block of input > crypto: arm64/sha3-ce - yield NEON after every block of input > crypto: arm64/sha512-ce - yield NEON after every block of input > crypto: arm64/sm3-ce - yield NEON after every block of input > DO NOT MERGE > > arch/arm64/crypto/Makefile | 3 - > arch/arm64/crypto/aes-ce-ccm-core.S | 150 ++++-- > arch/arm64/crypto/aes-ce-ccm-glue.c | 47 +- > arch/arm64/crypto/aes-ce.S | 15 +- > arch/arm64/crypto/aes-glue.c | 95 ++-- > arch/arm64/crypto/aes-modes.S | 562 +++++++++----------- > arch/arm64/crypto/aes-neonbs-core.S | 305 ++++++----- > arch/arm64/crypto/aes-neonbs-glue.c | 48 +- > arch/arm64/crypto/chacha20-neon-glue.c | 12 +- > arch/arm64/crypto/crc32-ce-core.S | 40 +- > arch/arm64/crypto/crct10dif-ce-core.S | 32 +- > arch/arm64/crypto/ghash-ce-core.S | 113 ++-- > arch/arm64/crypto/ghash-ce-glue.c | 28 +- > arch/arm64/crypto/sha1-ce-core.S | 42 +- > arch/arm64/crypto/sha2-ce-core.S | 37 +- > arch/arm64/crypto/sha256-glue.c | 36 +- > arch/arm64/crypto/sha3-ce-core.S | 77 ++- > arch/arm64/crypto/sha512-ce-core.S | 27 +- > arch/arm64/crypto/sm3-ce-core.S | 30 +- > arch/arm64/include/asm/assembler.h | 167 ++++++ > arch/arm64/kernel/asm-offsets.c | 2 + > crypto/testmgr.h | 259 +++++++++ > 22 files changed, 1392 insertions(+), 735 deletions(-) > > -- > 2.15.1 From mboxrd@z Thu Jan 1 00:00:00 1970 From: vakul.garg@nxp.com (Vakul Garg) Date: Sun, 11 Mar 2018 05:16:59 +0000 Subject: [PATCH v5 00/23] crypto: arm64 - play nice with CONFIG_PREEMPT In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi How does this patchset affect the throughput performance of crypto? Is it expected to increase? Regards Vakul > -----Original Message----- > From: linux-crypto-owner at vger.kernel.org [mailto:linux-crypto- > owner at vger.kernel.org] On Behalf Of Ard Biesheuvel > Sent: Saturday, March 10, 2018 8:52 PM > To: linux-crypto at vger.kernel.org > Cc: herbert at gondor.apana.org.au; linux-arm-kernel at lists.infradead.org; > Ard Biesheuvel ; Dave Martin > ; Russell King - ARM Linux > ; Sebastian Andrzej Siewior > ; Mark Rutland ; linux-rt- > users at vger.kernel.org; Peter Zijlstra ; Catalin > Marinas ; Will Deacon > ; Steven Rostedt ; Thomas > Gleixner > Subject: [PATCH v5 00/23] crypto: arm64 - play nice with CONFIG_PREEMPT > > As reported by Sebastian, the way the arm64 NEON crypto code currently > keeps kernel mode NEON enabled across calls into skcipher_walk_xxx() is > causing problems with RT builds, given that the skcipher walk API may > allocate and free temporary buffers it uses to present the input and output > arrays to the crypto algorithm in blocksize sized chunks (where blocksize is > the natural blocksize of the crypto algorithm), and doing so with NEON > enabled means we're alloc/free'ing memory with preemption disabled. > > This was deliberate: when this code was introduced, each > kernel_neon_begin() and kernel_neon_end() call incurred a fixed penalty of > storing resp. > loading the contents of all NEON registers to/from memory, and so doing it > less often had an obvious performance benefit. However, in the mean time, > we have refactored the core kernel mode NEON code, and now > kernel_neon_begin() only incurs this penalty the first time it is called after > entering the kernel, and the NEON register restore is deferred until returning > to userland. This means pulling those calls into the loops that iterate over the > input/output of the crypto algorithm is not a big deal anymore (although > there are some places in the code where we relied on the NEON registers > retaining their values between calls) > > So let's clean this up for arm64: update the NEON based skcipher drivers to > no longer keep the NEON enabled when calling into the skcipher walk API. > > As pointed out by Peter, this only solves part of the problem. So let's tackle it > more thoroughly, and update the algorithms to test the NEED_RESCHED flag > each time after processing a fixed chunk of input. > > Given that this issue was flagged by the RT people, I would appreciate it if > they could confirm whether they are happy with this approach. > > Changes since v4: > - rebase onto v4.16-rc3 > - apply the same treatment to new SHA512, SHA-3 and SM3 code that landed > in v4.16-rc1 > > Changes since v3: > - incorporate Dave's feedback on the asm macros to push/pop frames and to > yield > the NEON conditionally > - make frame_push/pop more easy to use, by recording the arguments to > frame_push, removing the need to specify them again when calling > frame_pop > - emit local symbol .Lframe_local_offset to allow code using the frame > push/pop > macros to index the stack more easily > - use the magic \@ macro invocation counter provided by GAS to generate > unique > labels om the NEON yield macros, rather than relying on chance > > Changes since v2: > - Drop logic to yield only after so many blocks - as it turns out, the > throughput of the algorithms that are most likely to be affected by the > overhead (GHASH and AES-CE) only drops by ~1% (on Cortex-A57), and if > that > is inacceptable, you are probably not using CONFIG_PREEMPT in the first > place. > - Add yield support to the AES-CCM driver > - Clean up macros based on feedback from Dave > - Given that I had to add stack frame logic to many of these functions, factor > it out and wrap it in a couple of macros > - Merge the changes to the core asm driver and glue code of the > GHASH/GCM > driver. The latter was not correct without the former. > > Changes since v1: > - add CRC-T10DIF test vector (#1) > - stop using GFP_ATOMIC in scatterwalk API calls, now that they are > executed > with preemption enabled (#2 - #6) > - do some preparatory refactoring on the AES block mode code (#7 - #9) > - add yield patches (#10 - #18) > - add test patch (#19) - DO NOT MERGE > > Cc: Dave Martin > Cc: Russell King - ARM Linux > Cc: Sebastian Andrzej Siewior > Cc: Mark Rutland > Cc: linux-rt-users at vger.kernel.org > Cc: Peter Zijlstra > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Steven Rostedt > Cc: Thomas Gleixner > > Ard Biesheuvel (23): > crypto: testmgr - add a new test case for CRC-T10DIF > crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop > crypto: arm64/aes-blk - move kernel mode neon en/disable into loop > crypto: arm64/aes-bs - move kernel mode neon en/disable into loop > crypto: arm64/chacha20 - move kernel mode neon en/disable into loop > crypto: arm64/aes-blk - remove configurable interleave > crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path > crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path > crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels > arm64: assembler: add utility macros to push/pop stack frames > arm64: assembler: add macros to conditionally yield the NEON under > PREEMPT > crypto: arm64/sha1-ce - yield NEON after every block of input > crypto: arm64/sha2-ce - yield NEON after every block of input > crypto: arm64/aes-ccm - yield NEON after every block of input > crypto: arm64/aes-blk - yield NEON after every block of input > crypto: arm64/aes-bs - yield NEON after every block of input > crypto: arm64/aes-ghash - yield NEON after every block of input > crypto: arm64/crc32-ce - yield NEON after every block of input > crypto: arm64/crct10dif-ce - yield NEON after every block of input > crypto: arm64/sha3-ce - yield NEON after every block of input > crypto: arm64/sha512-ce - yield NEON after every block of input > crypto: arm64/sm3-ce - yield NEON after every block of input > DO NOT MERGE > > arch/arm64/crypto/Makefile | 3 - > arch/arm64/crypto/aes-ce-ccm-core.S | 150 ++++-- > arch/arm64/crypto/aes-ce-ccm-glue.c | 47 +- > arch/arm64/crypto/aes-ce.S | 15 +- > arch/arm64/crypto/aes-glue.c | 95 ++-- > arch/arm64/crypto/aes-modes.S | 562 +++++++++----------- > arch/arm64/crypto/aes-neonbs-core.S | 305 ++++++----- > arch/arm64/crypto/aes-neonbs-glue.c | 48 +- > arch/arm64/crypto/chacha20-neon-glue.c | 12 +- > arch/arm64/crypto/crc32-ce-core.S | 40 +- > arch/arm64/crypto/crct10dif-ce-core.S | 32 +- > arch/arm64/crypto/ghash-ce-core.S | 113 ++-- > arch/arm64/crypto/ghash-ce-glue.c | 28 +- > arch/arm64/crypto/sha1-ce-core.S | 42 +- > arch/arm64/crypto/sha2-ce-core.S | 37 +- > arch/arm64/crypto/sha256-glue.c | 36 +- > arch/arm64/crypto/sha3-ce-core.S | 77 ++- > arch/arm64/crypto/sha512-ce-core.S | 27 +- > arch/arm64/crypto/sm3-ce-core.S | 30 +- > arch/arm64/include/asm/assembler.h | 167 ++++++ > arch/arm64/kernel/asm-offsets.c | 2 + > crypto/testmgr.h | 259 +++++++++ > 22 files changed, 1392 insertions(+), 735 deletions(-) > > -- > 2.15.1