From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D2DAC0018C for ; Wed, 16 Dec 2020 20:48:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65255233E2 for ; Wed, 16 Dec 2020 20:48:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729047AbgLPUsj (ORCPT ); Wed, 16 Dec 2020 15:48:39 -0500 Received: from mail.kernel.org ([198.145.29.99]:45900 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729032AbgLPUsi (ORCPT ); Wed, 16 Dec 2020 15:48:38 -0500 Date: Wed, 16 Dec 2020 12:47:56 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1608151678; bh=+qa4SiUsr0KsaR/ZIuIzJuRL1EymJEY0zWOtCIZHFCI=; h=From:To:Cc:Subject:References:In-Reply-To:From; b=BJuWmwkEy/KCqdJGkkDrUcR3d1Ylz9tOiF1TSGTNq2pXyU5DVX5ScwwN5Bwt0DiZx NkfTWaMwy2siXRBUsrx6RFRtDNm0CtWsQjwWNXkzRJICwmOXBO756cNe4i8ewicNfh 9qllvQd9MYIiZFGkbRCQjuw7aauvmRgKsAPuq/FPsqQTcj8K3qUw96/7OlNlhYrqxm olmwx0eGJtekws0ekhDkM68+6EnEKaBPGp5hdqEsPq0wshqIm5jhZqZyP5HH22HU5j U8uC4mcKZ80hfX+wGvB2gh7PbM7NPpzwRpx1Y6Hz3x17dPqcvhLDFd4xZ+/WBzHvOR EHe0TJcbUliTg== From: Eric Biggers To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Herbert Xu , David Sterba , "Jason A . Donenfeld" , Paul Crowley Subject: Re: [PATCH 0/5] crypto: add NEON-optimized BLAKE2b Message-ID: References: <20201215234708.105527-1-ebiggers@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201215234708.105527-1-ebiggers@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org On Tue, Dec 15, 2020 at 03:47:03PM -0800, Eric Biggers wrote: > This patchset adds a NEON implementation of BLAKE2b for 32-bit ARM. > Patches 1-4 prepare for it by making some updates to the generic > implementation, while patch 5 adds the actual NEON implementation. > > On Cortex-A7 (which these days is the most common ARM processor that > doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as > SHA-256, and slightly faster than SHA-1. It is also almost three times > as fast as the generic implementation of BLAKE2b: > > Algorithm Cycles per byte (on 4096-byte messages) > =================== ======================================= > blake2b-256-neon 14.1 > sha1-neon 16.4 > sha1-asm 20.8 > blake2s-256-generic 26.1 > sha256-neon 28.9 > sha256-asm 32.1 > blake2b-256-generic 39.9 > > This implementation isn't directly based on any other implementation, > but it borrows some ideas from previous NEON code I've written as well > as from chacha-neon-core.S. At least on Cortex-A7, it is faster than > the other NEON implementations of BLAKE2b I'm aware of (the > implementation in the BLAKE2 official repository using intrinsics, and > Andrew Moon's implementation which can be found in SUPERCOP). > > NEON-optimized BLAKE2b is useful because there is interest in using > BLAKE2b-256 for dm-verity on low-end Android devices (specifically, > devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On > these devices, the performance cost of upgrading to SHA-256 may be > unacceptable, whereas BLAKE2b-256 would actually improve performance. > > Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which > is intended for 32-bit platforms), on 32-bit ARM processors with NEON, > BLAKE2b is actually faster than BLAKE2s. This is because NEON supports > 64-bit operations, and because BLAKE2s's block size is too small for > NEON to be helpful for it. The best I've been able to do with BLAKE2s > on Cortex-A7 is 19.0 cpb with an optimized scalar implementation. By the way, if people are interested in having my ARM scalar implementation of BLAKE2s in the kernel too, I can send a patchset for that too. It just ended up being slower than BLAKE2b and SHA-1, so it wasn't as good for the use case mentioned above. If it were to be added as "blake2s-256-arm", we'd have: Algorithm Cycles per byte (on 4096-byte messages) =================== ======================================= blake2b-256-neon 14.1 sha1-neon 16.4 blake2s-256-arm 19.0 sha1-asm 20.8 blake2s-256-generic 26.1 sha256-neon 28.9 sha256-asm 32.1 blake2b-256-generic 39.9 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D41ACC4361B for ; Wed, 16 Dec 2020 20:49:53 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7A6A8233FB for ; Wed, 16 Dec 2020 20:49:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7A6A8233FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Kc+ELaIsSXVn8a/I3F+lkiTRonJejm9rCZk/GRmI46o=; b=QjQnScvTS6inOc5wdCOpX/0uu OCuKNpdwIz6Uc2rmBuFbzp8V4f/5vdp9PJw0zS6xeeCH+tQ3a6q7FHCnqSr4EAU6SHtIzpIZNd14s 8fsZfyGeWSwGUtp8jivTcMm6+7V0RsCy2XA/0y6ngbmI88OfqFviLWJ7pwGh4bGRacp3EtkbvJNKR PzuMI5FpLsHxxznCAmJJBpWoNiB06AFz+jOlajhXCrL53mUEJRHM86Ym2EQEaOLDMlJE7WU0D/P1o ZMJu0h2VKXOSu11yAvyzeFk+XXUt3YtYssOBr+s9e/ultc4h5H2IMl68qX+7zJNlCeHnhO0R/CUL8 9dq1PLyjg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kpdiO-00083m-7N; Wed, 16 Dec 2020 20:48:04 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kpdiL-00082j-E3 for linux-arm-kernel@lists.infradead.org; Wed, 16 Dec 2020 20:48:02 +0000 Date: Wed, 16 Dec 2020 12:47:56 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1608151678; bh=+qa4SiUsr0KsaR/ZIuIzJuRL1EymJEY0zWOtCIZHFCI=; h=From:To:Cc:Subject:References:In-Reply-To:From; b=BJuWmwkEy/KCqdJGkkDrUcR3d1Ylz9tOiF1TSGTNq2pXyU5DVX5ScwwN5Bwt0DiZx NkfTWaMwy2siXRBUsrx6RFRtDNm0CtWsQjwWNXkzRJICwmOXBO756cNe4i8ewicNfh 9qllvQd9MYIiZFGkbRCQjuw7aauvmRgKsAPuq/FPsqQTcj8K3qUw96/7OlNlhYrqxm olmwx0eGJtekws0ekhDkM68+6EnEKaBPGp5hdqEsPq0wshqIm5jhZqZyP5HH22HU5j U8uC4mcKZ80hfX+wGvB2gh7PbM7NPpzwRpx1Y6Hz3x17dPqcvhLDFd4xZ+/WBzHvOR EHe0TJcbUliTg== From: Eric Biggers To: linux-crypto@vger.kernel.org Subject: Re: [PATCH 0/5] crypto: add NEON-optimized BLAKE2b Message-ID: References: <20201215234708.105527-1-ebiggers@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201215234708.105527-1-ebiggers@kernel.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201216_154801_562558_178DE527 X-CRM114-Status: GOOD ( 20.95 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Jason A . Donenfeld" , Herbert Xu , David Sterba , Ard Biesheuvel , linux-arm-kernel@lists.infradead.org, Paul Crowley Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Dec 15, 2020 at 03:47:03PM -0800, Eric Biggers wrote: > This patchset adds a NEON implementation of BLAKE2b for 32-bit ARM. > Patches 1-4 prepare for it by making some updates to the generic > implementation, while patch 5 adds the actual NEON implementation. > > On Cortex-A7 (which these days is the most common ARM processor that > doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as > SHA-256, and slightly faster than SHA-1. It is also almost three times > as fast as the generic implementation of BLAKE2b: > > Algorithm Cycles per byte (on 4096-byte messages) > =================== ======================================= > blake2b-256-neon 14.1 > sha1-neon 16.4 > sha1-asm 20.8 > blake2s-256-generic 26.1 > sha256-neon 28.9 > sha256-asm 32.1 > blake2b-256-generic 39.9 > > This implementation isn't directly based on any other implementation, > but it borrows some ideas from previous NEON code I've written as well > as from chacha-neon-core.S. At least on Cortex-A7, it is faster than > the other NEON implementations of BLAKE2b I'm aware of (the > implementation in the BLAKE2 official repository using intrinsics, and > Andrew Moon's implementation which can be found in SUPERCOP). > > NEON-optimized BLAKE2b is useful because there is interest in using > BLAKE2b-256 for dm-verity on low-end Android devices (specifically, > devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On > these devices, the performance cost of upgrading to SHA-256 may be > unacceptable, whereas BLAKE2b-256 would actually improve performance. > > Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which > is intended for 32-bit platforms), on 32-bit ARM processors with NEON, > BLAKE2b is actually faster than BLAKE2s. This is because NEON supports > 64-bit operations, and because BLAKE2s's block size is too small for > NEON to be helpful for it. The best I've been able to do with BLAKE2s > on Cortex-A7 is 19.0 cpb with an optimized scalar implementation. By the way, if people are interested in having my ARM scalar implementation of BLAKE2s in the kernel too, I can send a patchset for that too. It just ended up being slower than BLAKE2b and SHA-1, so it wasn't as good for the use case mentioned above. If it were to be added as "blake2s-256-arm", we'd have: Algorithm Cycles per byte (on 4096-byte messages) =================== ======================================= blake2b-256-neon 14.1 sha1-neon 16.4 blake2s-256-arm 19.0 sha1-asm 20.8 blake2s-256-generic 26.1 sha256-neon 28.9 sha256-asm 32.1 blake2b-256-generic 39.9 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel