From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 329EBC433F5 for ; Tue, 18 Jan 2022 10:27:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:CC:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=uUwTH8X7z4zhVIpa/ZlScnm/HBr4QjUshwaXOHN508A=; b=oX75d339wNYptr 0v21XuXLIEqbQrDENRdZbWuZZeJxYd8AqxDrXpb6tcHbeEB/UV+E/AYQpGYJQa3r1ZDLf5Elyz3Gz hVBIIvL8sqd1n+2LO1ZmZiqzinlQ7IsFAsjXhJ2uAdWN34Qh+inv7kzqo0Gu5wV6OKaE9OS93pJkH q3e1H+VQM86LlgQWHZonJ0W7+KV93Tmcqf51W3oDE93FlXPOq99XvgI++wXt8lC3/MdGyICF+Vir9 8l1oXDkKuT7vdkDm/KwCaJieb1HYiRP1BiEz5YOKli2vu058Ru+eRJcpTDK7AEX+n0k37JIiapnMQ Vdf1OgLdHY/fncrzvttg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1n9lgB-001Aia-D3; Tue, 18 Jan 2022 10:25:31 +0000 Received: from smtpout1.mo529.mail-out.ovh.net ([178.32.125.2]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1n9lep-001ALe-8y for linux-arm-kernel@lists.infradead.org; Tue, 18 Jan 2022 10:24:10 +0000 Received: from mxplan1.mail.ovh.net (unknown [10.109.156.48]) by mo529.mail-out.ovh.net (Postfix) with ESMTPS id 5BB21D86FA3B; Tue, 18 Jan 2022 11:24:03 +0100 (CET) Received: from bracey.fi (37.59.142.101) by DAG4EX1.mxp1.local (172.16.2.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Tue, 18 Jan 2022 11:24:02 +0100 Authentication-Results: garm.ovh; auth=pass (GARM-101G0049ec73664-5e08-45a5-a937-b75d1194a294, D2C519BAB91300E05C7E54B340BF794D215B5709) smtp.auth=kevin@bracey.fi X-OVh-ClientIp: 82.181.225.135 From: Kevin Bracey To: Herbert Xu CC: , , Ard Biesheuvel , Will Deacon , Catalin Marinas , Kevin Bracey Subject: [PATCH v3 4/4] arm64: accelerate crc32_be Date: Tue, 18 Jan 2022 12:23:51 +0200 Message-ID: <20220118102351.3356105-5-kevin@bracey.fi> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220118102351.3356105-1-kevin@bracey.fi> References: <20220118102351.3356105-1-kevin@bracey.fi> MIME-Version: 1.0 X-Originating-IP: [37.59.142.101] X-ClientProxiedBy: DAG8EX2.mxp1.local (172.16.2.16) To DAG4EX1.mxp1.local (172.16.2.7) X-Ovh-Tracer-GUID: 0cc6860a-4815-45c3-989e-d261c4950712 X-Ovh-Tracer-Id: 10755440337342599273 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedvvddrudefgddugecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemucehtddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvffufffkofgjfhgggfgtihesthekredtredttdenucfhrhhomhepmfgvvhhinhcuuehrrggtvgihuceokhgvvhhinhessghrrggtvgihrdhfiheqnecuggftrfgrthhtvghrnhepjedujeevgeekhfdvlefhvdetueeuudehteejgfeiheehgfeikeeludefleetjeeunecuffhomhgrihhnpegtrhgtfedvrdhssgenucfkpheptddrtddrtddrtddpfeejrdehledrudegvddruddtudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhhtpdhhvghlohepmhigphhlrghnuddrmhgrihhlrdhovhhhrdhnvghtpdhinhgvtheptddrtddrtddrtddpmhgrihhlfhhrohhmpehkvghvihhnsegsrhgrtggvhidrfhhipdhnsggprhgtphhtthhopedupdhrtghpthhtohepkhgvvhhinhessghrrggtvgihrdhfih X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220118_022407_529444_DA435D4C X-CRM114-Status: GOOD ( 10.17 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org It makes no sense to leave crc32_be using the generic code while we only accelerate the little-endian ops. Even though the big-endian form doesn't fit as smoothly into the arm64, we can speed it up and avoid hitting the D cache. Tested on Cortex-A53. Without acceleration: crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64 crc32: self tests passed, processed 225944 bytes in 192240 nsec crc32c: CRC_LE_BITS = 64 crc32c: self tests passed, processed 112972 bytes in 21360 nsec With acceleration: crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64 crc32: self tests passed, processed 225944 bytes in 53480 nsec crc32c: CRC_LE_BITS = 64 crc32c: self tests passed, processed 112972 bytes in 21480 nsec Signed-off-by: Kevin Bracey Tested-by: Ard Biesheuvel Reviewed-by: Ard Biesheuvel Acked-by: Catalin Marinas --- arch/arm64/lib/crc32.S | 87 +++++++++++++++++++++++++++++++++++------- 1 file changed, 73 insertions(+), 14 deletions(-) diff --git a/arch/arm64/lib/crc32.S b/arch/arm64/lib/crc32.S index 0f9e10ecda23..8340dccff46f 100644 --- a/arch/arm64/lib/crc32.S +++ b/arch/arm64/lib/crc32.S @@ -11,7 +11,44 @@ .arch armv8-a+crc - .macro __crc32, c + .macro byteorder, reg, be + .if \be +CPU_LE( rev \reg, \reg ) + .else +CPU_BE( rev \reg, \reg ) + .endif + .endm + + .macro byteorder16, reg, be + .if \be +CPU_LE( rev16 \reg, \reg ) + .else +CPU_BE( rev16 \reg, \reg ) + .endif + .endm + + .macro bitorder, reg, be + .if \be + rbit \reg, \reg + .endif + .endm + + .macro bitorder16, reg, be + .if \be + rbit \reg, \reg + lsr \reg, \reg, #16 + .endif + .endm + + .macro bitorder8, reg, be + .if \be + rbit \reg, \reg + lsr \reg, \reg, #24 + .endif + .endm + + .macro __crc32, c, be=0 + bitorder w0, \be cmp x2, #16 b.lt 8f // less than 16 bytes @@ -24,10 +61,14 @@ add x8, x8, x1 add x1, x1, x7 ldp x5, x6, [x8] -CPU_BE( rev x3, x3 ) -CPU_BE( rev x4, x4 ) -CPU_BE( rev x5, x5 ) -CPU_BE( rev x6, x6 ) + byteorder x3, \be + byteorder x4, \be + byteorder x5, \be + byteorder x6, \be + bitorder x3, \be + bitorder x4, \be + bitorder x5, \be + bitorder x6, \be tst x7, #8 crc32\c\()x w8, w0, x3 @@ -55,33 +96,43 @@ CPU_BE( rev x6, x6 ) 32: ldp x3, x4, [x1], #32 sub x2, x2, #32 ldp x5, x6, [x1, #-16] -CPU_BE( rev x3, x3 ) -CPU_BE( rev x4, x4 ) -CPU_BE( rev x5, x5 ) -CPU_BE( rev x6, x6 ) + byteorder x3, \be + byteorder x4, \be + byteorder x5, \be + byteorder x6, \be + bitorder x3, \be + bitorder x4, \be + bitorder x5, \be + bitorder x6, \be crc32\c\()x w0, w0, x3 crc32\c\()x w0, w0, x4 crc32\c\()x w0, w0, x5 crc32\c\()x w0, w0, x6 cbnz x2, 32b -0: ret +0: bitorder w0, \be + ret 8: tbz x2, #3, 4f ldr x3, [x1], #8 -CPU_BE( rev x3, x3 ) + byteorder x3, \be + bitorder x3, \be crc32\c\()x w0, w0, x3 4: tbz x2, #2, 2f ldr w3, [x1], #4 -CPU_BE( rev w3, w3 ) + byteorder w3, \be + bitorder w3, \be crc32\c\()w w0, w0, w3 2: tbz x2, #1, 1f ldrh w3, [x1], #2 -CPU_BE( rev16 w3, w3 ) + byteorder16 w3, \be + bitorder16 w3, \be crc32\c\()h w0, w0, w3 1: tbz x2, #0, 0f ldrb w3, [x1] + bitorder8 w3, \be crc32\c\()b w0, w0, w3 -0: ret +0: bitorder w0, \be + ret .endm .align 5 @@ -99,3 +150,11 @@ alternative_if_not ARM64_HAS_CRC32 alternative_else_nop_endif __crc32 c SYM_FUNC_END(__crc32c_le) + + .align 5 +SYM_FUNC_START(crc32_be) +alternative_if_not ARM64_HAS_CRC32 + b crc32_be_base +alternative_else_nop_endif + __crc32 be=1 +SYM_FUNC_END(crc32_be) -- 2.25.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 590ABC433EF for ; Tue, 18 Jan 2022 10:32:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237454AbiARKcn (ORCPT ); Tue, 18 Jan 2022 05:32:43 -0500 Received: from smtpout4.mo529.mail-out.ovh.net ([217.182.185.173]:49797 "EHLO smtpout4.mo529.mail-out.ovh.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237443AbiARKcn (ORCPT ); Tue, 18 Jan 2022 05:32:43 -0500 Received: from mxplan1.mail.ovh.net (unknown [10.109.156.48]) by mo529.mail-out.ovh.net (Postfix) with ESMTPS id 5BB21D86FA3B; Tue, 18 Jan 2022 11:24:03 +0100 (CET) Received: from bracey.fi (37.59.142.101) by DAG4EX1.mxp1.local (172.16.2.7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.18; Tue, 18 Jan 2022 11:24:02 +0100 Authentication-Results: garm.ovh; auth=pass (GARM-101G0049ec73664-5e08-45a5-a937-b75d1194a294, D2C519BAB91300E05C7E54B340BF794D215B5709) smtp.auth=kevin@bracey.fi X-OVh-ClientIp: 82.181.225.135 From: Kevin Bracey To: Herbert Xu CC: , , Ard Biesheuvel , Will Deacon , Catalin Marinas , Kevin Bracey Subject: [PATCH v3 4/4] arm64: accelerate crc32_be Date: Tue, 18 Jan 2022 12:23:51 +0200 Message-ID: <20220118102351.3356105-5-kevin@bracey.fi> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220118102351.3356105-1-kevin@bracey.fi> References: <20220118102351.3356105-1-kevin@bracey.fi> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [37.59.142.101] X-ClientProxiedBy: DAG8EX2.mxp1.local (172.16.2.16) To DAG4EX1.mxp1.local (172.16.2.7) X-Ovh-Tracer-GUID: 0cc6860a-4815-45c3-989e-d261c4950712 X-Ovh-Tracer-Id: 10755440337342599273 X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedvvddrudefgddugecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemucehtddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvffufffkofgjfhgggfgtihesthekredtredttdenucfhrhhomhepmfgvvhhinhcuuehrrggtvgihuceokhgvvhhinhessghrrggtvgihrdhfiheqnecuggftrfgrthhtvghrnhepjedujeevgeekhfdvlefhvdetueeuudehteejgfeiheehgfeikeeludefleetjeeunecuffhomhgrihhnpegtrhgtfedvrdhssgenucfkpheptddrtddrtddrtddpfeejrdehledrudegvddruddtudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhhtpdhhvghlohepmhigphhlrghnuddrmhgrihhlrdhovhhhrdhnvghtpdhinhgvtheptddrtddrtddrtddpmhgrihhlfhhrohhmpehkvghvihhnsegsrhgrtggvhidrfhhipdhnsggprhgtphhtthhopedupdhrtghpthhtohepkhgvvhhinhessghrrggtvgihrdhfih Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org It makes no sense to leave crc32_be using the generic code while we only accelerate the little-endian ops. Even though the big-endian form doesn't fit as smoothly into the arm64, we can speed it up and avoid hitting the D cache. Tested on Cortex-A53. Without acceleration: crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64 crc32: self tests passed, processed 225944 bytes in 192240 nsec crc32c: CRC_LE_BITS = 64 crc32c: self tests passed, processed 112972 bytes in 21360 nsec With acceleration: crc32: CRC_LE_BITS = 64, CRC_BE BITS = 64 crc32: self tests passed, processed 225944 bytes in 53480 nsec crc32c: CRC_LE_BITS = 64 crc32c: self tests passed, processed 112972 bytes in 21480 nsec Signed-off-by: Kevin Bracey Tested-by: Ard Biesheuvel Reviewed-by: Ard Biesheuvel Acked-by: Catalin Marinas --- arch/arm64/lib/crc32.S | 87 +++++++++++++++++++++++++++++++++++------- 1 file changed, 73 insertions(+), 14 deletions(-) diff --git a/arch/arm64/lib/crc32.S b/arch/arm64/lib/crc32.S index 0f9e10ecda23..8340dccff46f 100644 --- a/arch/arm64/lib/crc32.S +++ b/arch/arm64/lib/crc32.S @@ -11,7 +11,44 @@ .arch armv8-a+crc - .macro __crc32, c + .macro byteorder, reg, be + .if \be +CPU_LE( rev \reg, \reg ) + .else +CPU_BE( rev \reg, \reg ) + .endif + .endm + + .macro byteorder16, reg, be + .if \be +CPU_LE( rev16 \reg, \reg ) + .else +CPU_BE( rev16 \reg, \reg ) + .endif + .endm + + .macro bitorder, reg, be + .if \be + rbit \reg, \reg + .endif + .endm + + .macro bitorder16, reg, be + .if \be + rbit \reg, \reg + lsr \reg, \reg, #16 + .endif + .endm + + .macro bitorder8, reg, be + .if \be + rbit \reg, \reg + lsr \reg, \reg, #24 + .endif + .endm + + .macro __crc32, c, be=0 + bitorder w0, \be cmp x2, #16 b.lt 8f // less than 16 bytes @@ -24,10 +61,14 @@ add x8, x8, x1 add x1, x1, x7 ldp x5, x6, [x8] -CPU_BE( rev x3, x3 ) -CPU_BE( rev x4, x4 ) -CPU_BE( rev x5, x5 ) -CPU_BE( rev x6, x6 ) + byteorder x3, \be + byteorder x4, \be + byteorder x5, \be + byteorder x6, \be + bitorder x3, \be + bitorder x4, \be + bitorder x5, \be + bitorder x6, \be tst x7, #8 crc32\c\()x w8, w0, x3 @@ -55,33 +96,43 @@ CPU_BE( rev x6, x6 ) 32: ldp x3, x4, [x1], #32 sub x2, x2, #32 ldp x5, x6, [x1, #-16] -CPU_BE( rev x3, x3 ) -CPU_BE( rev x4, x4 ) -CPU_BE( rev x5, x5 ) -CPU_BE( rev x6, x6 ) + byteorder x3, \be + byteorder x4, \be + byteorder x5, \be + byteorder x6, \be + bitorder x3, \be + bitorder x4, \be + bitorder x5, \be + bitorder x6, \be crc32\c\()x w0, w0, x3 crc32\c\()x w0, w0, x4 crc32\c\()x w0, w0, x5 crc32\c\()x w0, w0, x6 cbnz x2, 32b -0: ret +0: bitorder w0, \be + ret 8: tbz x2, #3, 4f ldr x3, [x1], #8 -CPU_BE( rev x3, x3 ) + byteorder x3, \be + bitorder x3, \be crc32\c\()x w0, w0, x3 4: tbz x2, #2, 2f ldr w3, [x1], #4 -CPU_BE( rev w3, w3 ) + byteorder w3, \be + bitorder w3, \be crc32\c\()w w0, w0, w3 2: tbz x2, #1, 1f ldrh w3, [x1], #2 -CPU_BE( rev16 w3, w3 ) + byteorder16 w3, \be + bitorder16 w3, \be crc32\c\()h w0, w0, w3 1: tbz x2, #0, 0f ldrb w3, [x1] + bitorder8 w3, \be crc32\c\()b w0, w0, w3 -0: ret +0: bitorder w0, \be + ret .endm .align 5 @@ -99,3 +150,11 @@ alternative_if_not ARM64_HAS_CRC32 alternative_else_nop_endif __crc32 c SYM_FUNC_END(__crc32c_le) + + .align 5 +SYM_FUNC_START(crc32_be) +alternative_if_not ARM64_HAS_CRC32 + b crc32_be_base +alternative_else_nop_endif + __crc32 be=1 +SYM_FUNC_END(crc32_be) -- 2.25.1