From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BAC43C10F0E for ; Mon, 15 Apr 2019 18:18:35 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7F04920854 for ; Mon, 15 Apr 2019 18:18:35 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="VOnqWHsl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7F04920854 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=7ad6DHKjBLJ6+OOJG7REWf0qNSIEkbvMDmqAbNmlKIg=; b=VOnqWHslna6WmbkZC/tB8I1Wo dvYzs5NiqIEwAb26G18Nzi1Julk64aNLh/VJkfQShh7z370Hp54czszOILzE4t6Yl6heTyNoYZVYr sXEeIUPEJEDhw2h3sQxQLgjEgvJasuSHthwLmQSlGdiUk9rQbC/8JIncnUpnO302fQDoOhNxVf7Wx eMyr6Fklh/CEGvtVaWf+CGyONqXxHOe9k5hsi6SsXoORLWLwtHtxyub1LGbO87maf0RPI8Lg+R/e+ xWL/tQ7PTLg3ogdWYVBJleyoEyu4liDqDqP3iOpUjysi5dhU5PZ4T7LQO1D6rsUlgD/GcEpPSJQKM YdcfdwyNg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hG6BZ-0002Xi-VX; Mon, 15 Apr 2019 18:18:29 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hG6BW-0002X8-Df for linux-arm-kernel@lists.infradead.org; Mon, 15 Apr 2019 18:18:27 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 896EC80D; Mon, 15 Apr 2019 11:18:25 -0700 (PDT) Received: from [10.1.196.75] (e110467-lin.cambridge.arm.com [10.1.196.75]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 25B893F706; Mon, 15 Apr 2019 11:18:24 -0700 (PDT) Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version To: Will Deacon , Zhangshaokun References: <20190218230842.11448-1-ard.biesheuvel@linaro.org> <20190412095243.GA27193@fuggles.cambridge.arm.com> From: Robin Murphy Message-ID: <41b30c72-c1c5-14b2-b2e1-3507d552830d@arm.com> Date: Mon, 15 Apr 2019 19:18:22 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <20190412095243.GA27193@fuggles.cambridge.arm.com> Content-Language: en-GB X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190415_111826_473808_7A5F91A4 X-CRM114-Status: GOOD ( 20.99 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , steve.capper@arm.com, netdev@vger.kernel.org, ilias.apalodimas@linaro.org, "huanglingyan \(A\)" , linux-arm-kernel@lists.infradead.org Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 12/04/2019 10:52, Will Deacon wrote: > On Fri, Apr 12, 2019 at 10:31:16AM +0800, Zhangshaokun wrote: >> On 2019/2/19 7:08, Ard Biesheuvel wrote: >>> It turns out that the IP checksumming code is still exercised often, >>> even though one might expect that modern NICs with checksum offload >>> have no use for it. However, as Lingyan points out, there are >>> combinations of features where the network stack may still fall back >>> to software checksumming, and so it makes sense to provide an >>> optimized implementation in software as well. >>> >>> So provide an implementation of do_csum() in scalar assembler, which, >>> unlike C, gives direct access to the carry flag, making the code run >>> substantially faster. The routine uses overlapping 64 byte loads for >>> all input size > 64 bytes, in order to reduce the number of branches >>> and improve performance on cores with deep pipelines. >>> >>> On Cortex-A57, this implementation is on par with Lingyan's NEON >>> implementation, and roughly 7x as fast as the generic C code. >>> >>> Cc: "huanglingyan (A)" >>> Signed-off-by: Ard Biesheuvel >>> --- >>> Test code after the patch. >> >> Hi maintainers and Ard, >> >> Any update on it? > > I'm waiting for Robin to come back with numbers for a C implementation. > > Robin -- did you get anywhere with that? Still not what I would call finished, but where I've got so far (besides an increasingly elaborate test rig) is as below - it still wants some unrolling in the middle to really fly (and actual testing on BE), but the worst-case performance already equals or just beats this asm version on Cortex-A53 with GCC 7 (by virtue of being alignment-insensitive and branchless except for the loop). Unfortunately, the advantage of C code being instrumentable does also come around to bite me... Robin. ----->8----- /* Looks dumb, but generates nice-ish code */ static u64 accumulate(u64 sum, u64 data) { __uint128_t tmp = (__uint128_t)sum + data; return tmp + (tmp >> 64); } unsigned int do_csum_c(const unsigned char *buff, int len) { unsigned int offset, shift, sum, count; u64 data, *ptr; u64 sum64 = 0; offset = (unsigned long)buff & 0x7; /* * This is to all intents and purposes safe, since rounding down cannot * result in a different page or cache line being accessed, and @buff * should absolutely not be pointing to anything read-sensitive. * It does, however, piss off KASAN... */ ptr = (u64 *)(buff - offset); shift = offset * 8; /* * Head: zero out any excess leading bytes. Shifting back by the same * amount should be at least as fast as any other way of handling the * odd/even alignment, and means we can ignore it until the very end. */ data = *ptr++; #ifdef __LITTLE_ENDIAN data = (data >> shift) << shift; #else data = (data << shift) >> shift; #endif count = 8 - offset; /* Body: straightforward aligned loads from here on... */ //TODO: fancy stuff with larger strides and uint128s? while(len > count) { sum64 = accumulate(sum64, data); data = *ptr++; count += 8; } /* * Tail: zero any over-read bytes similarly to the head, again * preserving odd/even alignment. */ shift = (count - len) * 8; #ifdef __LITTLE_ENDIAN data = (data << shift) >> shift; #else data = (data >> shift) << shift; #endif sum64 = accumulate(sum64, data); /* Finally, folding */ sum64 += (sum64 >> 32) | (sum64 << 32); sum = sum64 >> 32; sum += (sum >> 16) | (sum << 16); if (offset & 1) return (u16)swab32(sum); return sum >> 16; } _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel