From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA9EAC04E53 for ; Wed, 15 May 2019 12:39:58 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AC3792084E for ; Wed, 15 May 2019 12:39:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="hdIeZTDF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AC3792084E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Jt+SmG78HYjpCMbl7G4cPzS5LhpTvAAvv9UPKbr00/U=; b=hdIeZTDFeWMQLhAo8O5KpymHs sZTEQe/sKSFR0GXXSVcPk03NEeln30MfsZkvx4uwSje95CxNaUkOFzsfkPu6DbAvN2ghrsUJNGrBF MjlSZGucbALWcZ6rhb3utdvnFMeb68M1nHFKtOLNxcprMqADjphiAtOSBmcf7L427HjOU4bQ1Qwhx XVygE8yGI5CsGOI3mpeJgNzlJIf6B363ep8UmI0VPklCSm0A4yMpfUWQK7px31gwtatv9kn04STt4 aysuDWqeXtl/8Y9ZzPfQjfW9FL3EF7BVLbuvhDqTva9jL0C3x/Ex8K/t7DqnvlVXmVTkxVVfFYfir qEL8771mg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hQtCN-0004EC-A3; Wed, 15 May 2019 12:39:55 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hQtCK-0004Dn-EJ for linux-arm-kernel@lists.infradead.org; Wed, 15 May 2019 12:39:53 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C2262374; Wed, 15 May 2019 05:39:50 -0700 (PDT) Received: from [10.1.196.75] (e110467-lin.cambridge.arm.com [10.1.196.75]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 219443F71E; Wed, 15 May 2019 05:39:48 -0700 (PDT) Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version To: David Laight , 'Will Deacon' References: <20190218230842.11448-1-ard.biesheuvel@linaro.org> <20190412095243.GA27193@fuggles.cambridge.arm.com> <41b30c72-c1c5-14b2-b2e1-3507d552830d@arm.com> <20190515094704.GC24357@fuggles.cambridge.arm.com> <6e755b2daaf341128cb3b54f36172442@AcuMS.aculab.com> <3d4fdbb5-7c7f-9331-187e-14c09dd1c18d@arm.com> <9f72aecd99e74c1a939df6562ed9c18c@AcuMS.aculab.com> From: Robin Murphy Message-ID: <083f8222-971c-0d8e-4650-0d88b193e316@arm.com> Date: Wed, 15 May 2019 13:39:47 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <9f72aecd99e74c1a939df6562ed9c18c@AcuMS.aculab.com> Content-Language: en-GB X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190515_053952_494066_DECDFCE0 X-CRM114-Status: GOOD ( 18.77 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , "netdev@vger.kernel.org" , "ilias.apalodimas@linaro.org" , Zhangshaokun , "huanglingyan \(A\)" , "linux-arm-kernel@lists.infradead.org" , "steve.capper@arm.com" Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 15/05/2019 12:13, David Laight wrote: > From: Robin Murphy >> Sent: 15 May 2019 11:58 >> To: David Laight; 'Will Deacon' >> Cc: Zhangshaokun; Ard Biesheuvel; linux-arm-kernel@lists.infradead.org; netdev@vger.kernel.org; >> ilias.apalodimas@linaro.org; huanglingyan (A); steve.capper@arm.com >> Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version >> >> On 15/05/2019 11:15, David Laight wrote: >>> ... >>>>> ptr = (u64 *)(buff - offset); >>>>> shift = offset * 8; >>>>> >>>>> /* >>>>> * Head: zero out any excess leading bytes. Shifting back by the same >>>>> * amount should be at least as fast as any other way of handling the >>>>> * odd/even alignment, and means we can ignore it until the very end. >>>>> */ >>>>> data = *ptr++; >>>>> #ifdef __LITTLE_ENDIAN >>>>> data = (data >> shift) << shift; >>>>> #else >>>>> data = (data << shift) >> shift; >>>>> #endif >>> >>> I suspect that >>> #ifdef __LITTLE_ENDIAN >>> data &= ~0ull << shift; >>> #else >>> data &= ~0ull >> shift; >>> #endif >>> is likely to be better. >> >> Out of interest, better in which respects? For the A64 ISA at least, >> that would take 3 instructions plus an additional scratch register, e.g.: >> >> MOV x2, #~0 >> LSL x2, x2, x1 >> AND x0, x0, x1 [That should have been "AND x0, x1, x2", obviously...] >> >> (alternatively "AND x0, x0, x1 LSL x2" to save 4 bytes of code, but that >> will typically take as many cycles if not more than just pipelining the >> two 'simple' ALU instructions) >> >> Whereas the original is just two shift instruction in-place. >> >> LSR x0, x0, x1 >> LSL x0, x0, x1 >> >> If the operation were repeated, the constant generation could certainly >> be amortised over multiple subsequent ANDs for a net win, but that isn't >> the case here. > > On a superscaler processor you reduce the register dependency > chain by one instruction. > The original code is pretty much a single dependency chain so > you are likely to be able to generate the mask 'for free'. Gotcha, although 'free' still means additional I$ and register rename footprint, vs. (typically) just 1 extra cycle to forward an ALU result. It's an interesting consideration, but in our case there are almost certainly far more little in-order cores out in the wild than big OoO ones, and the double-shift will always be objectively better for those. Thanks, Robin. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel