From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965467AbcLXC1U (ORCPT ); Fri, 23 Dec 2016 21:27:20 -0500 Received: from mail-ua0-f177.google.com ([209.85.217.177]:36182 "EHLO mail-ua0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752989AbcLXC1R (ORCPT ); Fri, 23 Dec 2016 21:27:17 -0500 MIME-Version: 1.0 In-Reply-To: <942b91f25a63b22ec4946378a1fffe78d655cf18.1482545792.git.luto@kernel.org> References: <942b91f25a63b22ec4946378a1fffe78d655cf18.1482545792.git.luto@kernel.org> From: Andy Lutomirski Date: Fri, 23 Dec 2016 18:26:37 -0800 Message-ID: Subject: Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash To: Andy Lutomirski Cc: Daniel Borkmann , Netdev , LKML , Linux Crypto Mailing List , "Jason A. Donenfeld" , Hannes Frederic Sowa , Alexei Starovoitov , Eric Dumazet , Eric Biggers , Tom Herbert , "David S. Miller" , Ard Biesheuvel , Herbert Xu Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Dec 23, 2016 at 6:22 PM, Andy Lutomirski wrote: > There are some pieecs of kernel code that want to compute SHA256 > directly without going through the crypto core. Adjust the exported > API to decouple it from the crypto core. > > I suspect this will very slightly speed up the SHA256 shash operations > as well by reducing the amount of indirection involved. > I should also mention: there's a nice potential cleanup that's possible on top of this. Currently, most of the accelerated SHA256 implementations just swap out the block function. Another approach to enabling this would be to restructure sha256_update along the lines of: sha256_block_fn_t fn = arch_sha256_block_fn(len); sha256_base_do_update(sctx, data, len, arch_sha256_block_fn(len)); The idea being that arch code can decide whether to use an accelerated block function based on context (x86, for example, can't always use xmm regs) and length (on x86, using the accelerated versions for short digests is very slow due to the state save/restore that happens) and then the core code can just use it. This would allow a lot of the boilerplate that this patch was forced to modify to be deleted outright. --Andy