From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EE2EC388F9 for ; Sun, 8 Nov 2020 20:15:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D1FF720773 for ; Sun, 8 Nov 2020 20:15:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604866506; bh=DzrytRBv4yv5uzHi5q12YXuVzqPc8Sd8QLQFJ4ZtEog=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=R5A9wDVACNPN4pAKCnkkakYJHAhuHnrN3bZdDCnTkm+vO8BlZVUf/H3OQIreluNVq uJAK71XBhb4BfAhnTRD2i6Ya6bRUZosHpyoNcrwYPq8cTtg9DoOdkX5FoI1BiqSyWp k+gI+eP69UAdAzGhU7P+2oMRkIiTTzi3XQfXt7kg= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728593AbgKHUPF (ORCPT ); Sun, 8 Nov 2020 15:15:05 -0500 Received: from mail.kernel.org ([198.145.29.99]:33204 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727570AbgKHUPE (ORCPT ); Sun, 8 Nov 2020 15:15:04 -0500 Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7F020221FF for ; Sun, 8 Nov 2020 20:15:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604866503; bh=DzrytRBv4yv5uzHi5q12YXuVzqPc8Sd8QLQFJ4ZtEog=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=f5oYkN2hm7CQkCGPle6S4dyWS8tIC1tMMuM6cjt85xmYsYDSkVvDzUDGGcQ39Epj7 fM989LT3208jEDVzaHwGhOBfrNsS8EynVAM+5QpXuFoUBdExq+drjN1/A8h4mWRXQK 6qC3kXj2nXqTvTFacf0itLMI2Kt5NBWgDulF/eNA= Received: by mail-ot1-f53.google.com with SMTP id n15so6920135otl.8 for ; Sun, 08 Nov 2020 12:15:03 -0800 (PST) X-Gm-Message-State: AOAM530cpE8CcxCA+awGgT4zltmP32dMQJRA9Ukyx+YAP3dQOp9RTkLd +dQ9dK9IjeZW8Nvzd2GNSOpTgRbq7RhTHQU/HV0= X-Google-Smtp-Source: ABdhPJwyfxDiepX9W1qFF/eCJEQI5wveTdc/w527QnERqa9koyap7NZ2zJa5jDpKtZMUOMXbs0hB/4yDp21GgbqrWBw= X-Received: by 2002:a9d:62c1:: with SMTP id z1mr7838332otk.108.1604866502776; Sun, 08 Nov 2020 12:15:02 -0800 (PST) MIME-Version: 1.0 References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201108174014.GA219672@rani.riverdale.lan> <20201108180942.GA226037@rani.riverdale.lan> In-Reply-To: <20201108180942.GA226037@rani.riverdale.lan> From: Ard Biesheuvel Date: Sun, 8 Nov 2020 21:14:50 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization To: Arvind Sankar Cc: Arnd Bergmann , Adrian Ratiu , Nick Desaulniers , Russell King , Linux Kernel Mailing List , clang-built-linux , Nathan Chancellor , kernel@collabora.com, Linux ARM Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 8 Nov 2020 at 19:10, Arvind Sankar wrote: > > On Sun, Nov 08, 2020 at 12:40:14PM -0500, Arvind Sankar wrote: > > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote: > > > Due to a Clang bug [1] neon autoloop vectorization does not happen or > > > happens badly with no gains and considering previous GCC experiences > > > which generated unoptimized code which was worse than the default asm > > > implementation, it is safer to default clang builds to the known good > > > generic implementation. > > > > > > The kernel currently supports a minimum Clang version of v10.0.1, see > > > commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1"). > > > > > > When the bug gets eventually fixed, this commit could be reverted or, > > > if the minimum clang version bump takes a long time, a warning could > > > be added for users to upgrade their compilers like was done for GCC. > > > > > > [1] https://bugs.llvm.org/show_bug.cgi?id=40976 > > > > > > Signed-off-by: Adrian Ratiu > > > --- > > > arch/arm/include/asm/xor.h | 3 ++- > > > arch/arm/lib/Makefile | 3 +++ > > > arch/arm/lib/xor-neon.c | 4 ++++ > > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h > > > index aefddec79286..49937dafaa71 100644 > > > --- a/arch/arm/include/asm/xor.h > > > +++ b/arch/arm/include/asm/xor.h > > > @@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = { > > > NEON_TEMPLATES; \ > > > } while (0) > > > > > > -#ifdef CONFIG_KERNEL_MODE_NEON > > > +/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */ > > > +#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG) > > > > > > extern struct xor_block_template const xor_block_neon_inner; > > > > > > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > > > index 6d2ba454f25b..53f9e7dd9714 100644 > > > --- a/arch/arm/lib/Makefile > > > +++ b/arch/arm/lib/Makefile > > > @@ -43,8 +43,11 @@ endif > > > $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S > > > $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S > > > > > > +# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 > > > +ifndef CONFIG_CC_IS_CLANG > > > ifeq ($(CONFIG_KERNEL_MODE_NEON),y) > > > NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon > > > CFLAGS_xor-neon.o += $(NEON_FLAGS) > > > obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o > > > endif > > > +endif > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > > > index e1e76186ec23..84c91c48dfa2 100644 > > > --- a/arch/arm/lib/xor-neon.c > > > +++ b/arch/arm/lib/xor-neon.c > > > @@ -18,6 +18,10 @@ MODULE_LICENSE("GPL"); > > > * Pull in the reference implementations while instructing GCC (through > > > * -ftree-vectorize) to attempt to exploit implicit parallelism and emit > > > * NEON instructions. > > > + > > > + * On Clang the loop vectorizer is enabled by default, but due to a bug > > > + * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke > > > + * so xor-neon is disabled in favor of the default reg implementations. > > > */ > > > #ifdef CONFIG_CC_IS_GCC > > > #pragma GCC optimize "tree-vectorize" > > > -- > > > 2.29.0 > > > > > > > It's actually a bad idea to use #pragma GCC optimize. This is basically > > the same as tagging all the functions with __attribute__((optimize)), > > which GCC does not recommend for production use, as it _replaces_ > > optimization options rather than appending to them, and has been > > observed to result in dropping important compiler flags. > > > > There've been a few discussions recently around other such cases: > > https://lore.kernel.org/lkml/20201028171506.15682-1-ardb@kernel.org/ > > https://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.net/ > > > > For this file, given that it is supposed to use -ftree-vectorize for the > > whole file anyway, is there any reason it's not just added to CFLAGS via > > the Makefile? This seems to be the only use of pragma optimize in the > > kernel. > > Eg, this shows that the pragma results in dropping -fno-strict-aliasing. > https://godbolt.org/z/1nfrKT > > The first function does not use vectorization because s and s->a might > alias. > Thanks, Arvind. I wasn't aware of this issue at the time, but I agree that we should replace the #pragma with a command line option in this case. And given that we already set CFLAGS_xor-neon.o in the Makefile, adding it there would have been more straight-forward to begin with. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77407C5517A for ; Sun, 8 Nov 2020 20:16:01 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EC281206F4 for ; Sun, 8 Nov 2020 20:16:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="d44YkUIT"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="f5oYkN2h" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC281206F4 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=AU9BIORZyPWzah/PrdVHvhUTaSbn3K0LwyazfKOOSNA=; b=d44YkUITt4bwuM8sM3WQrslcr QZqoYwPmbKeHfpwk3fHa435r2YtHPvxe/wwgj6l/MD12oUwUuMEEC7Cv4HmSNCLupinIg0KKC8idz YZDARPVtoUNQ2h6ZKxq57tZw/8wpaMQF0HiaB1PD/6uY1cbPG/Xa4+/gylAeu/dLKAW4LHNiKQrrF LL7GcKy5/j2XKvvy15vXf598LUpyOVkjsD0Go9eFsyGfefoB6qKgl2e/MD7P+uEQzHLVHFW5txxmK b2bDBGCk2DyVcBQVMEwyVyvT0IaUYRTo5CpcoaI+6KMXk9IeURY3UT2S1HBvSZBR0Ml0WRJ5qsUOY ipb4aW8Xg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kbr5g-0002WJ-Db; Sun, 08 Nov 2020 20:15:08 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kbr5d-0002VU-FN for linux-arm-kernel@lists.infradead.org; Sun, 08 Nov 2020 20:15:07 +0000 Received: from mail-ot1-f52.google.com (mail-ot1-f52.google.com [209.85.210.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6D64D20731 for ; Sun, 8 Nov 2020 20:15:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1604866503; bh=DzrytRBv4yv5uzHi5q12YXuVzqPc8Sd8QLQFJ4ZtEog=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=f5oYkN2hm7CQkCGPle6S4dyWS8tIC1tMMuM6cjt85xmYsYDSkVvDzUDGGcQ39Epj7 fM989LT3208jEDVzaHwGhOBfrNsS8EynVAM+5QpXuFoUBdExq+drjN1/A8h4mWRXQK 6qC3kXj2nXqTvTFacf0itLMI2Kt5NBWgDulF/eNA= Received: by mail-ot1-f52.google.com with SMTP id k3so6896081otp.12 for ; Sun, 08 Nov 2020 12:15:03 -0800 (PST) X-Gm-Message-State: AOAM530dt6V9aBejKeQeywA+hjQqG+xeVS3jeVtjvnLpyxmgo79KcWll dPAzJhDpFgCon6kruM0tT9HwHDQNflh0xAhfmQ0= X-Google-Smtp-Source: ABdhPJwyfxDiepX9W1qFF/eCJEQI5wveTdc/w527QnERqa9koyap7NZ2zJa5jDpKtZMUOMXbs0hB/4yDp21GgbqrWBw= X-Received: by 2002:a9d:62c1:: with SMTP id z1mr7838332otk.108.1604866502776; Sun, 08 Nov 2020 12:15:02 -0800 (PST) MIME-Version: 1.0 References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201108174014.GA219672@rani.riverdale.lan> <20201108180942.GA226037@rani.riverdale.lan> In-Reply-To: <20201108180942.GA226037@rani.riverdale.lan> From: Ard Biesheuvel Date: Sun, 8 Nov 2020 21:14:50 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization To: Arvind Sankar X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201108_151505_716216_4CC0E893 X-CRM114-Status: GOOD ( 41.18 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arnd Bergmann , Adrian Ratiu , Nick Desaulniers , Russell King , Linux Kernel Mailing List , clang-built-linux , Nathan Chancellor , kernel@collabora.com, Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Sun, 8 Nov 2020 at 19:10, Arvind Sankar wrote: > > On Sun, Nov 08, 2020 at 12:40:14PM -0500, Arvind Sankar wrote: > > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote: > > > Due to a Clang bug [1] neon autoloop vectorization does not happen or > > > happens badly with no gains and considering previous GCC experiences > > > which generated unoptimized code which was worse than the default asm > > > implementation, it is safer to default clang builds to the known good > > > generic implementation. > > > > > > The kernel currently supports a minimum Clang version of v10.0.1, see > > > commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1"). > > > > > > When the bug gets eventually fixed, this commit could be reverted or, > > > if the minimum clang version bump takes a long time, a warning could > > > be added for users to upgrade their compilers like was done for GCC. > > > > > > [1] https://bugs.llvm.org/show_bug.cgi?id=40976 > > > > > > Signed-off-by: Adrian Ratiu > > > --- > > > arch/arm/include/asm/xor.h | 3 ++- > > > arch/arm/lib/Makefile | 3 +++ > > > arch/arm/lib/xor-neon.c | 4 ++++ > > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h > > > index aefddec79286..49937dafaa71 100644 > > > --- a/arch/arm/include/asm/xor.h > > > +++ b/arch/arm/include/asm/xor.h > > > @@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = { > > > NEON_TEMPLATES; \ > > > } while (0) > > > > > > -#ifdef CONFIG_KERNEL_MODE_NEON > > > +/* disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 */ > > > +#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG) > > > > > > extern struct xor_block_template const xor_block_neon_inner; > > > > > > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > > > index 6d2ba454f25b..53f9e7dd9714 100644 > > > --- a/arch/arm/lib/Makefile > > > +++ b/arch/arm/lib/Makefile > > > @@ -43,8 +43,11 @@ endif > > > $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S > > > $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S > > > > > > +# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 > > > +ifndef CONFIG_CC_IS_CLANG > > > ifeq ($(CONFIG_KERNEL_MODE_NEON),y) > > > NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon > > > CFLAGS_xor-neon.o += $(NEON_FLAGS) > > > obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o > > > endif > > > +endif > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > > > index e1e76186ec23..84c91c48dfa2 100644 > > > --- a/arch/arm/lib/xor-neon.c > > > +++ b/arch/arm/lib/xor-neon.c > > > @@ -18,6 +18,10 @@ MODULE_LICENSE("GPL"); > > > * Pull in the reference implementations while instructing GCC (through > > > * -ftree-vectorize) to attempt to exploit implicit parallelism and emit > > > * NEON instructions. > > > + > > > + * On Clang the loop vectorizer is enabled by default, but due to a bug > > > + * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke > > > + * so xor-neon is disabled in favor of the default reg implementations. > > > */ > > > #ifdef CONFIG_CC_IS_GCC > > > #pragma GCC optimize "tree-vectorize" > > > -- > > > 2.29.0 > > > > > > > It's actually a bad idea to use #pragma GCC optimize. This is basically > > the same as tagging all the functions with __attribute__((optimize)), > > which GCC does not recommend for production use, as it _replaces_ > > optimization options rather than appending to them, and has been > > observed to result in dropping important compiler flags. > > > > There've been a few discussions recently around other such cases: > > https://lore.kernel.org/lkml/20201028171506.15682-1-ardb@kernel.org/ > > https://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.net/ > > > > For this file, given that it is supposed to use -ftree-vectorize for the > > whole file anyway, is there any reason it's not just added to CFLAGS via > > the Makefile? This seems to be the only use of pragma optimize in the > > kernel. > > Eg, this shows that the pragma results in dropping -fno-strict-aliasing. > https://godbolt.org/z/1nfrKT > > The first function does not use vectorization because s and s->a might > alias. > Thanks, Arvind. I wasn't aware of this issue at the time, but I agree that we should replace the #pragma with a command line option in this case. And given that we already set CFLAGS_xor-neon.o in the Makefile, adding it there would have been more straight-forward to begin with. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel