From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14485C433ED for ; Tue, 18 May 2021 20:54:28 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 93673611BD for ; Tue, 18 May 2021 20:54:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 93673611BD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Y83dw41DUa0naFAaMrXLOpzzKh6gdAP8SD5iPcr9N9E=; b=RoSYaeLcwJVrhxgODhPv3PB9+ +Ze7zOUFJtkOE0Bu447da2sJe7rpUf0mY1T33iXgT0vZGSW9SbfQ4G0ecx5pUbBoah3MTCqIV6EPH da+DlUJ1D9oJjIwljxhlAalSi5o4Si7IBMt2YrEuup9DdMcjTxP+eKex0sS8v85/BeMavKXjI1y8F 5KG1tJ/AohbLkJRZrw8muHVZkqo0K/o0mQII/6AYJeu7jaF/ZHl3Z84RuMVuqx8HBK+W6PHnwtQZS X68GLz1dW5oB/Fe5YjZ2vkDcUkso7RFcf1h6JF2rMVD0xEw3scwqxl0DA/VfgkaaHgKfHOXSV2kPF CzFoTa0qg==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lj6hq-001sEs-MJ; Tue, 18 May 2021 20:52:46 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lj6hl-001sEU-A4 for linux-arm-kernel@desiato.infradead.org; Tue, 18 May 2021 20:52:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type:Cc:To:Subject:Message-ID :Date:From:In-Reply-To:References:MIME-Version:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=CeZgJ496+dTMVnXPFw5jycNFla9HrsNGwoQysbHLXUE=; b=PLUh4G3g4p8HsEuR7uxsqbk0m5 ssQQQbcAURJJ++e8DXHia+6iQj27ulvACobOyj3dq+dVzXtqr7PEEv8k+sxHuPgDL7lUO6lpQaO0j MuE+0BMkxitodBGc8PTml1KIm6QRXwBdlelR48E5bqOOevmr6kudJsu4vtMD7wfZN3WSbG1jE5fNh 7OPnvl/Cj/Z44H1enrdFOikKAdiIphvutIbj0b1L9iUCxKEykS3svILoqGHcwNT08R3Ondagsx4H7 ZddvW7/kGiy6W9f0qzLn6Zoppmcek+gybgj5Mx+Ayh8Xk5k37Yu2DkOysDrxN+a0HO3h6P8gYa9Nr u+uKWhYQ==; Received: from mail.kernel.org ([198.145.29.99]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lj6hg-00Exjk-RZ for linux-arm-kernel@lists.infradead.org; Tue, 18 May 2021 20:52:40 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id D37646135F for ; Tue, 18 May 2021 20:52:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1621371155; bh=YlMi/y1ocUW8FCBYYgYe9EQZBo6QiWLJNh5f/j+pP0E=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=OLkXv26WRHCFwP9ZdDJc/27YfDoAVhIBcnSIKC+rjNsOgVFXIkoWKfcDNwr3Fuwiu FMWuLnxquY14Wo7eqgnPX3svft1esnEYWXViA2TDt5Eqgm1aCMjK02QamFKdUQYqYc EUsApt5yKyYtIESaVa1iQ0k17SyqcsKte9kDWMhLufib+7iHzUo5s8BdzOysPGjLdG EO4BePVHXLG5V4yaAa5hTIc1ROTNW0zomq3qna8+C5jUnCe6h1lmAWFsnS67CWA1/a VT+yr7OrTgR+UbaTfdWVl3plamW5qoNPsJLYMPWOPpNJjc0cY1crMdWW/0/hfrVK+/ CLOnbPqsFDhzw== Received: by mail-wr1-f41.google.com with SMTP id r12so11698389wrp.1 for ; Tue, 18 May 2021 13:52:35 -0700 (PDT) X-Gm-Message-State: AOAM530nWPUd00BvX9/o7uT0+8mbhy/2xYAgZaTS2NCxReugC0gA3aYV cCJT0FUZc2xdM9vCnM5W1aUHBnHUw6uxwZNK6oA= X-Google-Smtp-Source: ABdhPJxil0NvKEv6VrVk2cq755gNemDrX2zp1yGDNAS6XdSetBpoojcm3BTi60gUywyeb4P0aQGylNMfelDFfmGQxqk= X-Received: by 2002:a5d:6dc4:: with SMTP id d4mr9728575wrz.105.1621371154421; Tue, 18 May 2021 13:52:34 -0700 (PDT) MIME-Version: 1.0 References: <20210514100106.3404011-1-arnd@kernel.org> <20210514100106.3404011-8-arnd@kernel.org> In-Reply-To: From: Arnd Bergmann Date: Tue, 18 May 2021 22:51:23 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers To: Linus Torvalds Cc: "Jason A. Donenfeld" , Eric Biggers , linux-arch , Vineet Gupta , Russell King , Herbert Xu , "David S. Miller" , Thomas Bogendoerfer , Linux ARM , Linux Kernel Mailing List , "open list:HARDWARE RANDOM NUMBER GENERATOR CORE" , "open list:BROADCOM NVRAM DRIVER" , Nobuhiro Iwamatsu X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210518_135236_950459_0A0BCF75 X-CRM114-Status: GOOD ( 26.50 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, May 18, 2021 at 6:12 PM Linus Torvalds wrote: > On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann wrote: > > Of the other cases, that xor-neon.c case actually makes sense. For > that file, it literally exists _only_ to get a vectorized version of > the trivial xor_8regs loop. It's one of the (very very few) cases of > vectorization we actually want. And in that case, we might even want > to make things easier - and more explicit - for the compiler by making > the xor_8regs loops use "restrict" pointers. > > That neon case actually wants and needs that tree-vectorization to > DTRT. But maybe it doesn't need the actual _loop_ vectorization? The > xor_8regs code is literally using hand-unrolled loops already, exactly > to make it as simple as possible for the compiler (but the lack of > "restrict" pointers means that it's not all that simple after all, and > I assume the compiler generates conditionals for the NEON case? Right, I think there is an ongoing debate over how to best handle this one in clang, since that does not do any vectorization for this file unless the pointers are marked "restrict". As far as I remember, there are some callers that want to do the xor in place though, which means restrict is wrong. > lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but > it also very much uses unaligned accesses, which is where the gcc bug > hits. I doubt that it really needs or wants the loop vectorization. I ran some limited speed tests with the lz4 sources that come with Ubuntu, using gcc-10.3 on an AMD Zen1 Threadripper with 10GB of /dev/urandom input. This package patches the sources to use -O2 and no vectorization, which turns out to be the fastest combination for my stupid test as well. The results are barely above noise, but it appears that -O2 -ftree-loop-vectorize makes it slightly slower than just -O2, while -O3 is even slower than that regardless of -fno-tree-loop-vectorize/-ftree-loop-vectorize. I see that Nobuhiro Iwamatsu (Cc'd) changed the Debian lz4 package to use -O2, but I don't see an explanation for it. I also see that the lz4 sources force -O2 on ppc64 because -O3 causes a 30% slowdown from vectorization. The kernel version is missing the bit that does this. > zstd looks very similar to lz4. > End result: at a minimum, I'd suggest using > "-fno-tree-loop-vectorize", although somebody should check that NEON > case. > And I still think that using O3 for anything halfway complicated > should be considered odd and need some strong numbers to enable. Agreed. I think there is a fairly strong case for just using -O2 on lz4 and backport that to stable. Searching for lz4 bugs with -O3 also finds several reports including one that I sent myself: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65709 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69702 I see that user space zstd is built with -O3 in Debian, but it the changelog also lists "Improved : better speed on clang and gcc -O2, thanks to Eric Biggers", so maybe Eric has some useful ideas on whether we should just use -O2 for the in-kernel version. Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel