From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9906BC433EF for ; Mon, 13 Dec 2021 14:27:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-Id:Date:Subject:Cc :To:From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:In-Reply-To:References: List-Owner; bh=kb1GOWCjdLfBl61ONt+VXmAKxdsZv0HzSMnP7Jzx230=; b=ZnwrLk9BX6udob 0yt+I7s7VV/BZPJt3RIWdq4Zm4nFM15zkDlwaHQY3/oV9qHQq2Ldc9qBOgMadDj0qSpWpeHtYl9EZ XahaVJm4ki84iQbgsuMGsPQCx5yS6glKnN8A2sJhZj32mx5hchMkK8MXPhmdOtUAs1O/Cak2+b8t1 NWbuMsr0Dh/19aoQ3gzu1SHNYYns7HGxompQmFCEhP9j85Ek49upmZZwirYEdUEFC2U4eiBdVjxmO sQulw5Jgm+GeDipU+UJKa6rkjmw9s+IHaGwTpxj7tqDyIvTbcfxpzjJfQ8bJcKKMG99Z6AZZeXcXQ RJcnKPNLWEB5JDGkquOw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mwmG9-00A1cS-L1; Mon, 13 Dec 2021 14:25:00 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mwlv3-009tDd-99 for linux-arm-kernel@lists.infradead.org; Mon, 13 Dec 2021 14:03:11 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id EA394B80F01; Mon, 13 Dec 2021 14:03:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 93438C34602; Mon, 13 Dec 2021 14:03:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1639404186; bh=4N6dE7e9O5bhhgf8NfQT8awNOZc3QWrCTxzSYCnJ9ig=; h=From:To:Cc:Subject:Date:From; b=lTUKhfsbee2F3lE4BuRwTB+T6y3qEPFtcLr9uxS9iOcKLdwv5TERSUn688lhpZM8P 0e9mlrkACFb/Bqh+RETbsXqY37dVB+dAP+7l++xwscJvHQG438WnTHf1lyU5FBnOGe 4PAwyWJaL7joJl1U2TTsbMEQHUyhpsX3AkWTs4mGSZCdQlJwXDxQIKO8mFTLLvNgdt NxowjE8bw/vAevphcdJMDJgeU5o/urEHsUwYw5dflfv+Dk/2g/iz0pYcmAe+hiqabK M5Zn6zwwZTaQtFL02oiQAPfnerwIiDBmlTOnHjsgbMBoQsXzjWW4yApqsJtfq3459m 5NRwgic3tzbfw== From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: catalin.marinas@arm.com, will@kernel.org, mark.rutland@arm.com, Ard Biesheuvel Subject: [PATCH v2] arm64/xor: use EOR3 instructions when available Date: Mon, 13 Dec 2021 15:02:52 +0100 Message-Id: <20211213140252.2856053-1-ardb@kernel.org> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=6449; h=from:subject; bh=4N6dE7e9O5bhhgf8NfQT8awNOZc3QWrCTxzSYCnJ9ig=; b=owEB7QES/pANAwAKAcNPIjmS2Y8kAcsmYgBht1KMLiWqFnkvepZbXQmQeXeA1MglbJM4+ctDZr/q 8AcGbV2JAbMEAAEKAB0WIQT72WJ8QGnJQhU3VynDTyI5ktmPJAUCYbdSjAAKCRDDTyI5ktmPJH2VC/ 0ehvdVlWTCo+Btd8id1DojyGy1XCol65Mfv9/+p6LMnYmnRslgpEUnvg+mE0btTX8hibltRx+RFoSB chKu1/iJZbyt6LrkW//Oj73oQx8ZOAPp1+hYtpi8k6IBm+mXcQUlNkzJPIJwxbukXPx4j17ZuaM7dv wckO8KRdZk9wZK5q/Bd1Vmb48dml9+tUf8pvxGLn4JeRUacTXAOgEgVHRZhkkrah9WXt73f8P0RLyn 9WcgzG0uecNfO4UBt4saP/uW6MuXQGUTqLzWjqQ59tukD5VlIn1+tOe5sDnfkyR4XKnpJkxD3zzGs+ 9K+N9dNpaNWAAVEjQBlFS7XOK9pm44J+UJjptD4b2CEGahKNamtcGoBSwJwPnQ2N6g5I/K3XB9C/cO jIi026JWfng0RGEO7Tu9CzdVoQQstoRqsMFodUB0fiDr+/Um97AUqmGQYd3G3E0zbeKd9CDa57FhD7 EjkgkAvCc5cCi9z0kEHSRRjW9bz3eX9WouekZxXEWogDU= X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211213_060309_638252_B52069E4 X-CRM114-Status: GOOD ( 17.23 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Use the EOR3 instruction to implement xor_blocks() if the instruction is available, which is the case if the CPU implements the SHA-3 extension. This is about 20% faster on Apple M1 when using the 5-way version. Signed-off-by: Ard Biesheuvel --- v2: decouple from static_call changes incorporate assembler support detection changes proposed by Catalin arch/arm64/Kconfig | 6 + arch/arm64/Makefile | 5 + arch/arm64/lib/xor-neon.c | 147 +++++++++++++++++++- 3 files changed, 157 insertions(+), 1 deletion(-) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17..63d41ba4e716 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1545,6 +1545,12 @@ endmenu menu "ARMv8.2 architectural features" +config AS_HAS_ARMV8_2 + def_bool $(cc-option,-Wa$(comma)-march=armv8.2-a) + +config AS_HAS_SHA3 + def_bool $(as-instr,.arch armv8.2-a+sha3) + config ARM64_PMEM bool "Enable support for persistent memory" select ARCH_HAS_PMEM_API diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index e8cfc5868aa8..2f1de88651e6 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -58,6 +58,11 @@ stack_protector_prepare: prepare0 include/generated/asm-offsets.h)) endif +ifeq ($(CONFIG_AS_HAS_ARMV8_2), y) +# make sure to pass the newest target architecture to -march. +asm-arch := armv8.2-a +endif + # Ensure that if the compiler supports branch protection we default it # off, this will be overridden if we are using branch protection. branch-prot-flags-y += $(call cc-option,-mbranch-protection=none) diff --git a/arch/arm64/lib/xor-neon.c b/arch/arm64/lib/xor-neon.c index 11bf4f8aca68..5c8688700f63 100644 --- a/arch/arm64/lib/xor-neon.c +++ b/arch/arm64/lib/xor-neon.c @@ -167,7 +167,136 @@ void xor_arm64_neon_5(unsigned long bytes, unsigned long *p1, } while (--lines > 0); } -struct xor_block_template const xor_block_inner_neon = { +static inline uint64x2_t eor3(uint64x2_t p, uint64x2_t q, uint64x2_t r) +{ + uint64x2_t res; + + asm(ARM64_ASM_PREAMBLE ".arch_extension sha3\n" + "eor3 %0.16b, %1.16b, %2.16b, %3.16b" + : "=w"(res) : "w"(p), "w"(q), "w"(r)); + return res; +} + +static void xor_arm64_eor3_3(unsigned long bytes, unsigned long *p1, + unsigned long *p2, unsigned long *p3) +{ + uint64_t *dp1 = (uint64_t *)p1; + uint64_t *dp2 = (uint64_t *)p2; + uint64_t *dp3 = (uint64_t *)p3; + + register uint64x2_t v0, v1, v2, v3; + long lines = bytes / (sizeof(uint64x2_t) * 4); + + do { + /* p1 ^= p2 ^ p3 */ + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), + vld1q_u64(dp3 + 0)); + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), + vld1q_u64(dp3 + 2)); + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), + vld1q_u64(dp3 + 4)); + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), + vld1q_u64(dp3 + 6)); + + /* store */ + vst1q_u64(dp1 + 0, v0); + vst1q_u64(dp1 + 2, v1); + vst1q_u64(dp1 + 4, v2); + vst1q_u64(dp1 + 6, v3); + + dp1 += 8; + dp2 += 8; + dp3 += 8; + } while (--lines > 0); +} + +static void xor_arm64_eor3_4(unsigned long bytes, unsigned long *p1, + unsigned long *p2, unsigned long *p3, + unsigned long *p4) +{ + uint64_t *dp1 = (uint64_t *)p1; + uint64_t *dp2 = (uint64_t *)p2; + uint64_t *dp3 = (uint64_t *)p3; + uint64_t *dp4 = (uint64_t *)p4; + + register uint64x2_t v0, v1, v2, v3; + long lines = bytes / (sizeof(uint64x2_t) * 4); + + do { + /* p1 ^= p2 ^ p3 */ + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), + vld1q_u64(dp3 + 0)); + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), + vld1q_u64(dp3 + 2)); + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), + vld1q_u64(dp3 + 4)); + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), + vld1q_u64(dp3 + 6)); + + /* p1 ^= p4 */ + v0 = veorq_u64(v0, vld1q_u64(dp4 + 0)); + v1 = veorq_u64(v1, vld1q_u64(dp4 + 2)); + v2 = veorq_u64(v2, vld1q_u64(dp4 + 4)); + v3 = veorq_u64(v3, vld1q_u64(dp4 + 6)); + + /* store */ + vst1q_u64(dp1 + 0, v0); + vst1q_u64(dp1 + 2, v1); + vst1q_u64(dp1 + 4, v2); + vst1q_u64(dp1 + 6, v3); + + dp1 += 8; + dp2 += 8; + dp3 += 8; + dp4 += 8; + } while (--lines > 0); +} + +static void xor_arm64_eor3_5(unsigned long bytes, unsigned long *p1, + unsigned long *p2, unsigned long *p3, + unsigned long *p4, unsigned long *p5) +{ + uint64_t *dp1 = (uint64_t *)p1; + uint64_t *dp2 = (uint64_t *)p2; + uint64_t *dp3 = (uint64_t *)p3; + uint64_t *dp4 = (uint64_t *)p4; + uint64_t *dp5 = (uint64_t *)p5; + + register uint64x2_t v0, v1, v2, v3; + long lines = bytes / (sizeof(uint64x2_t) * 4); + + do { + /* p1 ^= p2 ^ p3 */ + v0 = eor3(vld1q_u64(dp1 + 0), vld1q_u64(dp2 + 0), + vld1q_u64(dp3 + 0)); + v1 = eor3(vld1q_u64(dp1 + 2), vld1q_u64(dp2 + 2), + vld1q_u64(dp3 + 2)); + v2 = eor3(vld1q_u64(dp1 + 4), vld1q_u64(dp2 + 4), + vld1q_u64(dp3 + 4)); + v3 = eor3(vld1q_u64(dp1 + 6), vld1q_u64(dp2 + 6), + vld1q_u64(dp3 + 6)); + + /* p1 ^= p4 ^ p5 */ + v0 = eor3(v0, vld1q_u64(dp4 + 0), vld1q_u64(dp5 + 0)); + v1 = eor3(v1, vld1q_u64(dp4 + 2), vld1q_u64(dp5 + 2)); + v2 = eor3(v2, vld1q_u64(dp4 + 4), vld1q_u64(dp5 + 4)); + v3 = eor3(v3, vld1q_u64(dp4 + 6), vld1q_u64(dp5 + 6)); + + /* store */ + vst1q_u64(dp1 + 0, v0); + vst1q_u64(dp1 + 2, v1); + vst1q_u64(dp1 + 4, v2); + vst1q_u64(dp1 + 6, v3); + + dp1 += 8; + dp2 += 8; + dp3 += 8; + dp4 += 8; + dp5 += 8; + } while (--lines > 0); +} + +struct xor_block_template xor_block_inner_neon __ro_after_init = { .name = "__inner_neon__", .do_2 = xor_arm64_neon_2, .do_3 = xor_arm64_neon_3, @@ -176,6 +305,22 @@ struct xor_block_template const xor_block_inner_neon = { }; EXPORT_SYMBOL(xor_block_inner_neon); +static int __init xor_neon_init(void) +{ + if (IS_ENABLED(CONFIG_AS_HAS_SHA3) && cpu_have_named_feature(SHA3)) { + xor_block_inner_neon.do_3 = xor_arm64_eor3_3; + xor_block_inner_neon.do_4 = xor_arm64_eor3_4; + xor_block_inner_neon.do_5 = xor_arm64_eor3_5; + } + return 0; +} +module_init(xor_neon_init); + +static void __exit xor_neon_exit(void) +{ +} +module_exit(xor_neon_exit); + MODULE_AUTHOR("Jackie Liu "); MODULE_DESCRIPTION("ARMv8 XOR Extensions"); MODULE_LICENSE("GPL"); -- 2.30.2 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel