From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B24B3C2D0A3 for ; Wed, 4 Nov 2020 17:58:20 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3B95220759 for ; Wed, 4 Nov 2020 17:58:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="hiL6ReRx"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="GUw/u0M0" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3B95220759 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=t90SbcIn0XDKWvNVk3x4g5B6K8qPJi1HFF1yaWZDC6I=; b=hiL6ReRxHSJUuJ/rMUj3yJko/ b2UHyrToVFd0yEDLoXPPyQH1NTPBTWFLtGXkjzv+IUN/vSdmuAXdCNbR096jYXAiWWHoNSBPThj4P xA/xkVu27rvh3AguvajYXVnT1e85w3cgHUn0JgS7ttgsAOEkzJI67gcvFBX5DWeDtu40DqUUncg9M Z4uEgaD66LPxnVd8xs6W9y+4p3TD/M9Kg56cJx06mRDARe8t2LoIM1FS1T4QDIEbaJcX4J93MTklf doWnwKd/0IJMrx0bzF/SSF8I6CMsr0Ra38CWz4I+G4L6lcCDXtoN57YrvUTImE7TlF14CQYotVcyy u9FTh59aQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kaN2b-0004lw-L4; Wed, 04 Nov 2020 17:57:49 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kaN2Y-0004lA-Nl for linux-arm-kernel@lists.infradead.org; Wed, 04 Nov 2020 17:57:47 +0000 Received: from sol.localdomain (172-10-235-113.lightspeed.sntcca.sbcglobal.net [172.10.235.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 604B920639; Wed, 4 Nov 2020 17:57:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604512664; bh=xW2DCv1bVj8xZGAEExQ8YdJt58u6JujrKWAdykKMBT8=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=GUw/u0M0wg3yZno9YljUV9hR5ZAxyDxdNF2C6Ss9DHqGp7RMQWyA+lCUSGL9/sewd bjHoLOQawHNfYTX6Dkm1T4VDytSDbbZFTcs4ao69FBZq6BIdQJ52+jshAfL4IfyLf/ TfTdHPpqxkd3yvrJF8AcJcj5DDIPfMr7YbegNZIc= Date: Wed, 4 Nov 2020 09:57:42 -0800 From: Eric Biggers To: l00374334 Subject: Re: [PATCH 1/1] arm64: Accelerate Adler32 using arm64 SVE instructions. Message-ID: <20201104175742.GA846@sol.localdomain> References: <20201103121506.1533-1-liqiang64@huawei.com> <20201103121506.1533-2-liqiang64@huawei.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20201103121506.1533-2-liqiang64@huawei.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201104_125746_870185_8B9D1FA3 X-CRM114-Status: GOOD ( 20.38 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: alexandre.torgue@st.com, catalin.marinas@arm.com, linux-crypto@vger.kernel.org, mcoquelin.stm32@gmail.com, will@kernel.org, davem@davemloft.net, linux-arm-kernel@lists.infradead.org, herbert@gondor.apana.org.au Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Nov 03, 2020 at 08:15:06PM +0800, l00374334 wrote: > From: liqiang > > In the libz library, the checksum algorithm adler32 usually occupies > a relatively high hot spot, and the SVE instruction set can easily > accelerate it, so that the performance of libz library will be > significantly improved. > > We can divides buf into blocks according to the bit width of SVE, > and then uses vector registers to perform operations in units of blocks > to achieve the purpose of acceleration. > > On machines that support ARM64 sve instructions, this algorithm is > about 3~4 times faster than the algorithm implemented in C language > in libz. The wider the SVE instruction, the better the acceleration effect. > > Measured on a Taishan 1951 machine that supports 256bit width SVE, > below are the results of my measured random data of 1M and 10M: > > [root@xxx adler32]# ./benchmark 1000000 > Libz alg: Time used: 608 us, 1644.7 Mb/s. > SVE alg: Time used: 166 us, 6024.1 Mb/s. > > [root@xxx adler32]# ./benchmark 10000000 > Libz alg: Time used: 6484 us, 1542.3 Mb/s. > SVE alg: Time used: 2034 us, 4916.4 Mb/s. > > The blocks can be of any size, so the algorithm can automatically adapt > to SVE hardware with different bit widths without modifying the code. > > > Signed-off-by: liqiang Note that this patch does nothing to actually wire up the kernel's copy of libz (lib/zlib_{deflate,inflate}/) to use this implementation of Adler32. To do so, libz would either need to be changed to use the shash API, or you'd need to implement an adler32() function in lib/crypto/ that automatically uses an accelerated implementation if available, and make libz call it. Also, in either case a C implementation would be required too. There can't be just an architecture-specific implementation. Also as others have pointed out, there's probably not much point in having a SVE implementation of Adler32 when there isn't even a NEON implementation yet. It's not too hard to implement Adler32 using NEON, and there are already several permissively-licensed NEON implementations out there that could be used as a reference, e.g. my implementation using NEON instrinsics here: https://github.com/ebiggers/libdeflate/blob/v1.6/lib/arm/adler32_impl.h - Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel