From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAE52C43381 for ; Thu, 28 Feb 2019 15:28:52 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B24D620C01 for ; Thu, 28 Feb 2019 15:28:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="DyhFIug9"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="x3IvWJW5" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B24D620C01 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=ulsf3QZ6g75KuD5aZCkFn2DuKumies2G35IAwJtEt2U=; b=DyhFIug9/8U9uy pTl0GyyHFmVmnMgBdNvOiv81VbHIK93LkKprb6OO/B+o4kCEbuBzJCPv3OCPLNRBDwf9vnnlGB/TN bbTjQPGeuhkTuqBj4PJG5xEBj6gaWF5H2e3E0WOVHKC8zrB3u1KWN16IfTv1xrDduScARTyhmotWg D0vQiTwb5YWAeQiguerqrNgcEdg4cmpmgzE5oAm+biGifyWDm+hWt8ZUhkx7ZSttz2WP/EzIAOCGp SrDKU7BdSLaopHsrRqGbOqkbqIZfnQTGzn1l/BMqv3Mgx8/TkQZ66azlyJ26NulD64IjdOgE55Mrx JZv8AFF442kqg5bk1UgA==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1gzNc8-0006s7-Tm; Thu, 28 Feb 2019 15:28:48 +0000 Received: from mail-it1-x143.google.com ([2607:f8b0:4864:20::143]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1gzNc4-0006r4-SU for linux-arm-kernel@lists.infradead.org; Thu, 28 Feb 2019 15:28:46 +0000 Received: by mail-it1-x143.google.com with SMTP id l139so14095103ita.5 for ; Thu, 28 Feb 2019 07:28:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=WtMQaKXHMj1yrej83DYCTdZB2upKg2Cbvjolz4UvPSk=; b=x3IvWJW5cOQp7TSw/PyzJb1vtnswWXd3tFzERNIxbdaCNl8jsjCbJZlWsQV3I4Z0aZ gctRNZz3IYlMP6CzS3JR6ugZVvkfe+7c0BDYg6RnEr+ivWzlkEGQBvstwWK+wOVl++8C 5LY7sBFV0RaXcoYnYaMDvPanxqMLVEJcpaYWTaaDBy5IWicjnnjaRlQ2Vaw/tMCBvivV toygr4vKgsSEvFt09EeCb2bbklatkdJ0sBFZVdO+vxWF0KCladxrc58ZvXREV+WDIff/ cZ33cROA0sPmQ7mAg0He6nQCJUCb2jh2J+mJwb267fqb5HkJVlF3aj6GMbpS92tb3+1I llsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WtMQaKXHMj1yrej83DYCTdZB2upKg2Cbvjolz4UvPSk=; b=TQhCbRSWIjZxUgy6xs5ldWBUbfS90/1fnVbTQ9tV+m7kjVp8VnpsrBcnWWyz3v9d9e KSwOEm6ET/P3Ij2b6F6N1tN3bqaF0C8SO4meCWY517+hD8doFIYGYX8+jK2FU+Wni1Fn vSygyj03zVqpzLEfrZKDF6B9RUSfC2U4WvbHRrgYHkCxmQuGHpH37rKlyChe3vAnDVKA tgYjky2ioWp19gLejvQ+KzmiBpoT1iP8tgWg8081Jyed2mwR7X/JmttMNa4G8KglAWk4 tbdxJG4WM9Rw1DEvz6P90Q3lhbNe0Sk80jkxiHF3E92OPHXyqj7IQswMgj6FnV+CC5dz OfIg== X-Gm-Message-State: APjAAAWOFLJpZk2WDiIvUWkgHkZH/bEIbx0h4ih1E6BVaJl63OyjlU4Q GRBZ/+JiLx/kBynvH6xdNhzbxK0iU6JnlpNYWF9hUg== X-Google-Smtp-Source: AHgI3Ib3OWv6ev9Nv738ngHaaCXW/ypCTypyyCq6pve9ybOFIaQqrXei0cMTt5si2gj65TqpCLioB9laxmGB95QaHJE= X-Received: by 2002:a24:1947:: with SMTP id b68mr158925itb.121.1551367721324; Thu, 28 Feb 2019 07:28:41 -0800 (PST) MIME-Version: 1.0 References: <20190218230842.11448-1-ard.biesheuvel@linaro.org> <20190219150848.GA26652@apalos> <93697477-4dcc-4ab2-c838-2f487d334c56@arm.com> In-Reply-To: <93697477-4dcc-4ab2-c838-2f487d334c56@arm.com> From: Ard Biesheuvel Date: Thu, 28 Feb 2019 16:28:30 +0100 Message-ID: Subject: Re: [PATCH] arm64: do_csum: implement accelerated scalar version To: Robin Murphy X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190228_072844_927290_38212F8D X-CRM114-Status: GOOD ( 24.38 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Steve Capper , Catalin Marinas , Ilias Apalodimas , Will Deacon , "huanglingyan \(A\)" , "" , linux-arm-kernel Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, 28 Feb 2019 at 16:14, Robin Murphy wrote: > > Hi Ard, > > On 28/02/2019 14:16, Ard Biesheuvel wrote: > > (+ Catalin) > > > > On Tue, 19 Feb 2019 at 16:08, Ilias Apalodimas > > wrote: > >> > >> On Tue, Feb 19, 2019 at 12:08:42AM +0100, Ard Biesheuvel wrote: > >>> It turns out that the IP checksumming code is still exercised often, > >>> even though one might expect that modern NICs with checksum offload > >>> have no use for it. However, as Lingyan points out, there are > >>> combinations of features where the network stack may still fall back > >>> to software checksumming, and so it makes sense to provide an > >>> optimized implementation in software as well. > >>> > >>> So provide an implementation of do_csum() in scalar assembler, which, > >>> unlike C, gives direct access to the carry flag, making the code run > >>> substantially faster. The routine uses overlapping 64 byte loads for > >>> all input size > 64 bytes, in order to reduce the number of branches > >>> and improve performance on cores with deep pipelines. > >>> > >>> On Cortex-A57, this implementation is on par with Lingyan's NEON > >>> implementation, and roughly 7x as fast as the generic C code. > >>> > >>> Cc: "huanglingyan (A)" > >>> Signed-off-by: Ard Biesheuvel > > ... > >> > >> Acked-by: Ilias Apalodimas > > > > Full patch here > > > > https://lore.kernel.org/linux-arm-kernel/20190218230842.11448-1-ard.biesheuvel@linaro.org/ > > > > This was a follow-up to some discussions about Lingyan's NEON code, > > CC'ed to netdev@ so people could chime in as to whether we need > > accelerated checksumming code in the first place. Thanks for taking a look. > FWIW ever since we did ip_fast_csum() I've been meaning to see how well > I can do with a similar tweaked C implementation for this (mostly for > fun). Since I've recently dug out my RK3328 box for other reasons I'll > give this a test - that's a weedy little quad-A53 whose GbE hardware > checksumming is slightly busted and has to be turned off, so the > do_csum() overhead under heavy network load is comparatively massive. > (plus it's non-EFI so I should be able to try big-endian easily too) > Yes please. I've been meaning to run this on A72 myself, but ever since my MacchiatoBin self-combusted, I've been relying on AWS for this, which is a bit finicky. As for the C implementation, not having access to the carry flag is pretty limiting, so I wonder how you intend to get around that. > The asm looks pretty reasonable to me - instinct says there's *possibly* > some value for out-of-order cores in doing the 8-way accumulations in a > more pairwise fashion, but I guess either way the carry flag dependency > is going to dominate, so it may well be moot. Yes. In fact, I was surprised the speedup is as dramatic as it is despite of this, but I guess they optimize for this rather well at the uarch level. > What may be more > worthwhile is taking the effort to align the source pointer, at least > for larger inputs, so as to be kinder to little cores - according to its > optimisation guide, A55 is fairly sensitive to unaligned loads, so I'd > assume that's true of its older/smaller friends too. I'll see what I can > measure in practice - until proven otherwise I'd have no great objection > to merging this patch as-is if the need is real. Improvements can always > come later :) > Good point re alignment, I didn't consider that at all tbh. I'll let the maintainers decide whether/when to merge this. I don't feel strongly either way. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel