From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48D30C388F7 for ; Tue, 10 Nov 2020 23:54:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 03256207FB for ; Tue, 10 Nov 2020 23:54:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732383AbgKJXyr (ORCPT ); Tue, 10 Nov 2020 18:54:47 -0500 Received: from bhuna.collabora.co.uk ([46.235.227.227]:32774 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732345AbgKJXyq (ORCPT ); Tue, 10 Nov 2020 18:54:46 -0500 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: aratiu) with ESMTPSA id 0F4511F4565A From: Adrian Ratiu To: Nick Desaulniers Cc: Nathan Chancellor , Arnd Bergmann , Linux ARM , clang-built-linux , Russell King , LKML , Collabora Kernel ML , Ard Biesheuvel Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization In-Reply-To: References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201106101419.GB3811063@ubuntu-m3-large-x86> <87wnyyvh56.fsf@collabora.com> <871rh2i9xg.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> Date: Wed, 11 Nov 2020 01:56:22 +0200 Message-ID: <87sg9ghil5.fsf@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; format=flowed Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 10 Nov 2020, Nick Desaulniers wrote: > On Mon, Nov 9, 2020 at 11:51 AM Adrian Ratiu > wrote: >> >> On Fri, 06 Nov 2020, Nick Desaulniers >> wrote: >> > +#pragma clang loop vectorize(enable) >> > do { >> > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] >> > ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; >> > ``` seems to generate the vectorized code. >> > >> > Why don't we find a way to make those pragma's more toolchain >> > portable, rather than open coding them like I have above >> > rather than this series? >> >> Hi again Nick, >> >> How did you verify the above pragmas generate correct >> vectorized code? Have you tested this specific use case? > > I read the disassembly before and after my suggested use of > pragmas; look for vld/vstr. You can also add > -Rpass-missed=loop-vectorize to CFLAGS_xor-neon.o in > arch/arm/lib/Makefile and rebuild arch/arm/lib/xor-neon.o with > CONFIG_BTRFS enabled. > >> >> I'm asking because overrulling the cost model might not be >> enough, the only thing I can confirm is that the generated code >> is changed, but not that it is correct in any way. The object >> disasm also looks weird, but I don't have enough knowledge to >> start debugging what's happening within LLVM/Clang itself. > > It doesn't "look weird" to me. The loop is versioned based on a > comparison whether the parameters alias or not. There's a > non-vectorized version if the parameters are equal or close > enough to overlap. There's another version of the loop that's > vectorized. If you want just the vectorized version, then you > have to mark the parameters as __restrict qualified, then check > that all callers are ok with that. > Thank you for the explanation, that does make sense now. I'm just a compiler optimization noob, sorry. All your help is much appreciated. >> >> I also get some new warnings with your code [1], besides the >> previously 'vectorization was possible but not beneficial' >> which is still present. It is quite funny because these two >> warnings seem to contradict themselves. :) > > From which compiler? ``` $ clang > -Wpass-failed=transform-warning -c -x c /dev/null warning: > unknown warning option '-Wpass-failed=transform-warning'; did > you mean '-Wprofile-instr-missing'? [-Wunknown-warning-option] > ``` I'm using Clang 10.0.1-1 from the Arch Linux repo. In the LLVM sources that transform-warning appears to be documented under llvm-10.0.1.src/docs/Passes.rst:1227:-transform-warning Here's a build log: http://ix.io/2DIc I always get those warnings with the pragma change you suggested, even on clean builds on latest linux-next. I looked at the Arch PKGBUILD and they don't appear to do anything special other than patching to enable SSP and PIE by default (eg llvm bug 13410). > > The pragma is clang specific, hence my recommendation to wrap it > in an #ifdef __clang__. > Yes, I understand that. :) >> >> At this point I do not trust the compiler and am inclined to do > > Nonsense. > >> like was done for GCC when it was broken: disable the >> optimization and warn users to upgrade after the compiler is >> fixed and confirmed to work. >> >> If you agree I can send a v2 with this and also drop the GCC >> pragma as Arvind and Ard suggested. > > If you resend "this" as in 2/2, I will NACK it. There's nothing > wrong with the cost model; it's saying there's little point in > generating the vectorized version because you're still going to > need a non-vectorized loop version anyways. Claiming there is a > compiler bug here is dubious just because the cost models > between two compilers differ slightly. Ok, so that "remark" from the compiler is safe to ignore. > > Resend the patch removing the warning, remove the GCC pragma, > but if you want to change anything here for Clang, use `#pragma > clang loop vectorize(enable)` wrapped in an `#ifdef __clang__`. > Thanks for making the NACK clear, so the way forward is to either use the pragma if I can figure out the new 'loop not vectorized' warning (which might also be a red herring) or just leave Clang as is. :) >> >> Kind regards, >> Adrian >> >> [1] >> ./include/asm-generic/xor.h:11:1: warning: loop not vectorized: >> the optimizer was unable to perform the requested transformation; >> the transformation might be disabled or specified as part of an >> unsupported transformation ordering >> [-Wpass-failed=transform-warning] xor_8regs_2(unsigned long bytes, >> unsigned long *p1, unsigned long *p2) > > > -- > Thanks, > ~Nick Desaulniers From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD752C388F7 for ; Tue, 10 Nov 2020 23:55:15 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 721BC207FB for ; Tue, 10 Nov 2020 23:55:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="cVJq3Twq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 721BC207FB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=collabora.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:References:In-Reply-To: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=poS84oNdu7lRNJf1ufeSK/qyGv0SMHD3bWoQ6LoTILg=; b=cVJq3TwqqJRrBARJbuq7prKjb hyzpaK/JTRoc6KwGKXDAlYcsh9spg95+f7L0tmdFhnTGMqkS34t+jOML3wI3+Iv+qHsReNWWIOcYY AWcAqmukXoqa1mANgzZJwPZHzE9j8j44+NW5ulMAcv3xD/L9EuUsztz2s9UC+sn0CwDlXN6rXxgjp O9lFlRkKIFN02QG/1SGTME+S8VctmULL+VSEKvTj6CeHDiVAPNQnyCKB22vF5axqzboKPz7m9d+CS baqz9k/SAkCN1ZHhptfXAMe+NMmSV1AVr4bQ0zanX6qWBloV7zgKP0jiStfPBybVnGQMOLlhF7uLc 6AUgTkJOA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kcdTM-000405-It; Tue, 10 Nov 2020 23:54:48 +0000 Received: from bhuna.collabora.co.uk ([46.235.227.227]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kcdTJ-0003za-Nd for linux-arm-kernel@lists.infradead.org; Tue, 10 Nov 2020 23:54:47 +0000 Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: aratiu) with ESMTPSA id 0F4511F4565A From: Adrian Ratiu To: Nick Desaulniers Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization In-Reply-To: References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201106101419.GB3811063@ubuntu-m3-large-x86> <87wnyyvh56.fsf@collabora.com> <871rh2i9xg.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> Date: Wed, 11 Nov 2020 01:56:22 +0200 Message-ID: <87sg9ghil5.fsf@collabora.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201110_185445_955698_455DF9B4 X-CRM114-Status: GOOD ( 35.37 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arnd Bergmann , LKML , Russell King , clang-built-linux , Nathan Chancellor , Collabora Kernel ML , Ard Biesheuvel , Linux ARM Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 10 Nov 2020, Nick Desaulniers wrote: > On Mon, Nov 9, 2020 at 11:51 AM Adrian Ratiu > wrote: >> >> On Fri, 06 Nov 2020, Nick Desaulniers >> wrote: >> > +#pragma clang loop vectorize(enable) >> > do { >> > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] >> > ^= p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; >> > ``` seems to generate the vectorized code. >> > >> > Why don't we find a way to make those pragma's more toolchain >> > portable, rather than open coding them like I have above >> > rather than this series? >> >> Hi again Nick, >> >> How did you verify the above pragmas generate correct >> vectorized code? Have you tested this specific use case? > > I read the disassembly before and after my suggested use of > pragmas; look for vld/vstr. You can also add > -Rpass-missed=loop-vectorize to CFLAGS_xor-neon.o in > arch/arm/lib/Makefile and rebuild arch/arm/lib/xor-neon.o with > CONFIG_BTRFS enabled. > >> >> I'm asking because overrulling the cost model might not be >> enough, the only thing I can confirm is that the generated code >> is changed, but not that it is correct in any way. The object >> disasm also looks weird, but I don't have enough knowledge to >> start debugging what's happening within LLVM/Clang itself. > > It doesn't "look weird" to me. The loop is versioned based on a > comparison whether the parameters alias or not. There's a > non-vectorized version if the parameters are equal or close > enough to overlap. There's another version of the loop that's > vectorized. If you want just the vectorized version, then you > have to mark the parameters as __restrict qualified, then check > that all callers are ok with that. > Thank you for the explanation, that does make sense now. I'm just a compiler optimization noob, sorry. All your help is much appreciated. >> >> I also get some new warnings with your code [1], besides the >> previously 'vectorization was possible but not beneficial' >> which is still present. It is quite funny because these two >> warnings seem to contradict themselves. :) > > From which compiler? ``` $ clang > -Wpass-failed=transform-warning -c -x c /dev/null warning: > unknown warning option '-Wpass-failed=transform-warning'; did > you mean '-Wprofile-instr-missing'? [-Wunknown-warning-option] > ``` I'm using Clang 10.0.1-1 from the Arch Linux repo. In the LLVM sources that transform-warning appears to be documented under llvm-10.0.1.src/docs/Passes.rst:1227:-transform-warning Here's a build log: http://ix.io/2DIc I always get those warnings with the pragma change you suggested, even on clean builds on latest linux-next. I looked at the Arch PKGBUILD and they don't appear to do anything special other than patching to enable SSP and PIE by default (eg llvm bug 13410). > > The pragma is clang specific, hence my recommendation to wrap it > in an #ifdef __clang__. > Yes, I understand that. :) >> >> At this point I do not trust the compiler and am inclined to do > > Nonsense. > >> like was done for GCC when it was broken: disable the >> optimization and warn users to upgrade after the compiler is >> fixed and confirmed to work. >> >> If you agree I can send a v2 with this and also drop the GCC >> pragma as Arvind and Ard suggested. > > If you resend "this" as in 2/2, I will NACK it. There's nothing > wrong with the cost model; it's saying there's little point in > generating the vectorized version because you're still going to > need a non-vectorized loop version anyways. Claiming there is a > compiler bug here is dubious just because the cost models > between two compilers differ slightly. Ok, so that "remark" from the compiler is safe to ignore. > > Resend the patch removing the warning, remove the GCC pragma, > but if you want to change anything here for Clang, use `#pragma > clang loop vectorize(enable)` wrapped in an `#ifdef __clang__`. > Thanks for making the NACK clear, so the way forward is to either use the pragma if I can figure out the new 'loop not vectorized' warning (which might also be a red herring) or just leave Clang as is. :) >> >> Kind regards, >> Adrian >> >> [1] >> ./include/asm-generic/xor.h:11:1: warning: loop not vectorized: >> the optimizer was unable to perform the requested transformation; >> the transformation might be disabled or specified as part of an >> unsupported transformation ordering >> [-Wpass-failed=transform-warning] xor_8regs_2(unsigned long bytes, >> unsigned long *p1, unsigned long *p2) > > > -- > Thanks, > ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel