From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-23.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF56BC43331 for ; Thu, 21 Jan 2021 00:49:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8CA5F23787 for ; Thu, 21 Jan 2021 00:49:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392872AbhAUAPs (ORCPT ); Wed, 20 Jan 2021 19:15:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390857AbhATXQ4 (ORCPT ); Wed, 20 Jan 2021 18:16:56 -0500 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4C6CC06134A for ; Wed, 20 Jan 2021 15:15:51 -0800 (PST) Received: by mail-pf1-x42d.google.com with SMTP id y205so193495pfc.5 for ; Wed, 20 Jan 2021 15:15:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=muNwPqj3IkfhKZEjpn4BdoeLJPrHYLRIQ5wRRrb83eM=; b=iQRFLI8z9+aHSUAgDB0kBragh2wVk/7sHwLmPrRBZO43UHiX1V/6M0AMy7vrkxcrdg K+iQIFtaKkbGj3uhWVuqVYJWrjKqJaFjN33RsFtvOwIBo0+ptibWLv3o5z+Arh4GdzcB QgLNuX9XkEDNNeRcrLs9ZzpdV5xGkN5+gxICpUn0m13Zfx3sY7+A6BkrmGS0HPaziWLY jA1eyiE54ledZbFf6JrQ97IciWJWTQQMn1Fr4i5B7GGwVrn0riJRijgttxdnzr3xpJmy dOI10Jy/LnGN3uS2Wo/UlEeyJft1U0iN6l2+iZ8vJ+7EjOcmk07wrO2Wql6Mzi1qn/WT 3zUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=muNwPqj3IkfhKZEjpn4BdoeLJPrHYLRIQ5wRRrb83eM=; b=fBIO72Vo29Se+MSiqtgjgQB9l9SzGV6VSfQsxFEPIkHLcaZmrwWARs0z/LGR3YCacp ymQcJzcVDM+o94jMFIcjtR+ExhBkYgHgXIdlidMKN8VVQq5BuNIGeG7D7zOC7HOuKgff zdE2MuRPq10rtlW4C6Jl9aEvYfoq7J5BmYDel2v13WrKUrlY0APtOMTB/PbkB6HmqH2e HXZ1scXficWkPiLqvpMmmzhlTMzu/meDQ1K7WGUcHwdz07Mc0hGpjCvod1RQlG7xfPtb zWrakXS8x8n15AO0GnWEYk26nVwMjMuUnLRmPxgYzBEykAXHAAaKpd4lDQRlV0FrDch9 S2xQ== X-Gm-Message-State: AOAM533woLvRSgXFaA29eCLxGe1S1P/+zQeTpJVEmGw77/r9vP7+aTDD 9sDWmbtxoNy6Hdx5aWBR/OSBYewoYDCE9zAHvWVaLg== X-Google-Smtp-Source: ABdhPJz3jBFaSrs5DktJXtA4EyUYw3yXxMJ24F+1drEkngz60VRBnhVnVmaZXnjQkLyg1PI4+Xati1D2lk8MIjYKf24= X-Received: by 2002:a63:1f47:: with SMTP id q7mr11604193pgm.10.1611184550928; Wed, 20 Jan 2021 15:15:50 -0800 (PST) MIME-Version: 1.0 References: <20210119131724.308884-1-adrian.ratiu@collabora.com> <20210119131724.308884-2-adrian.ratiu@collabora.com> In-Reply-To: From: Nick Desaulniers Date: Wed, 20 Jan 2021 15:15:40 -0800 Message-ID: Subject: Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning To: Arnd Bergmann , Ard Biesheuvel Cc: Adrian Ratiu , Arnd Bergmann , Linux ARM , Nathan Chancellor , Russell King , Arvind Sankar , clang-built-linux , Collabora Kernel ML , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 20, 2021 at 3:09 PM Nick Desaulniers wrote: > > On Tue, Jan 19, 2021 at 1:35 PM Arnd Bergmann wrote: > > > > On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built > > Linux wrote: > > > > > > On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu wrote: > > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > > > > index b99dd8e1c93f..f9f3601cc2d1 100644 > > > > --- a/arch/arm/lib/xor-neon.c > > > > +++ b/arch/arm/lib/xor-neon.c > > > > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL"); > > > > #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon' > > > > #endif > > > > > > > > +/* > > > > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the > > > > + * compiler does not produce vectorized code due to its cost model. > > > > + * See: https://github.com/ClangBuiltLinux/linux/issues/503 > > > > + */ > > > > +#ifdef CONFIG_CC_IS_CLANG > > > > +#warning Clang does not vectorize code in this file. > > > > +#endif > > > > > > Arnd, remind me again why it's a bug that the compiler's cost model > > > says it's faster to not produce a vectorized version of these loops? > > > I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8 > > > > The point is that without vectorizing the code, there is no point in building > > both the default xor code and a "neon" version that has to save/restore > > the neon registers but doesn't actually use them. > > Doesn't that already happen today with GCC when the pointer arguments > are overlapping? The loop is "versioned" and may not actually use the > NEON registers at all at runtime for such arguments. > https://godbolt.org/z/q48q8v See also: > https://bugs.llvm.org/show_bug.cgi?id=40976#c11. Or am I missing > something? > > So I'm thinking if we extend out this pattern to the rest of the > functions, we can actually avoid calls to > kernel_neon_begin()/kernel_neon_end() for cases in which pointers > would be too close to use the vectorized loop version; meaning for GCC > this would be an optimization (don't save neon registers when you're > not going to use them). I would probably consider moving > include/asm-generic/xor.h somewhere under arch/arm/ > perhaps...err...something for the other users of . > > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h > index aefddec79286..abd748d317e8 100644 > --- a/arch/arm/include/asm/xor.h > +++ b/arch/arm/include/asm/xor.h > @@ -148,12 +148,12 @@ extern struct xor_block_template const > xor_block_neon_inner; > static void > xor_neon_2(unsigned long bytes, unsigned long *p1, unsigned long *p2) > { > - if (in_interrupt()) { > - xor_arm4regs_2(bytes, p1, p2); > - } else { > + if (!in_interrupt() && abs((uintptr_t)p1, (uintptr_t)p2) >= 8) { > kernel_neon_begin(); > xor_block_neon_inner.do_2(bytes, p1, p2); > kernel_neon_end(); > + } else { > + xor_arm4regs_2(bytes, p1, p2); > } > } > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > index b99dd8e1c93f..0e8e474c0523 100644 > --- a/arch/arm/lib/xor-neon.c > +++ b/arch/arm/lib/xor-neon.c > @@ -14,22 +14,6 @@ MODULE_LICENSE("GPL"); > #error You should compile this file with '-march=armv7-a > -mfloat-abi=softfp -mfpu=neon' > #endif > > -/* > - * Pull in the reference implementations while instructing GCC (through > - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit > - * NEON instructions. > - */ > -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6) > -#pragma GCC optimize "tree-vectorize" Err...we need to keep this but use the -f flag with __restrict for GCC to vectorize: https://godbolt.org/z/9acnEv -- Thanks, ~Nick Desaulniers From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.9 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C1B7C433DB for ; Wed, 20 Jan 2021 23:17:50 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ECA37235F9 for ; Wed, 20 Jan 2021 23:17:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org ECA37235F9 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Rp3xZ9Xtf3OXqpgdX5ci3gzM02T5TLE4xZ3A3oZotr4=; b=gnFx2Ip80OpTkj+dMqtzv1BpJ /lxdgBmMH+G+7uv9Bp9RerVV1e3YpWRNskrS5pwdr9OQ/9nD98B2Co7GVxUvUcIXTuG3kiUOzrVM4 TyowWhFwX4YlZuMdMv0dM0UHxfIMNw249Blb4zLFdfzk7yZYbNPWhqiLOKdb/qiNFZg+XSObtCDAf m2UikY/6sSF9qQWBtCK/YWpq1/xHiqHMxBkxB8MR3uj3TOgufU0pMfJnK6hCOCNi63B/tGisXYiTc +yxy+jSkjXqGW/Zup2+KF35cbs7m+XwP8WB20Z58UYoc4IcZo818BWL42RJfPEOKw1Bio4VePxDCI GYxOCzr8Q==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l2Mhg-0002vM-Nl; Wed, 20 Jan 2021 23:15:56 +0000 Received: from mail-pg1-x536.google.com ([2607:f8b0:4864:20::536]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l2Mhe-0002um-4m for linux-arm-kernel@lists.infradead.org; Wed, 20 Jan 2021 23:15:55 +0000 Received: by mail-pg1-x536.google.com with SMTP id g15so16304422pgu.9 for ; Wed, 20 Jan 2021 15:15:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=muNwPqj3IkfhKZEjpn4BdoeLJPrHYLRIQ5wRRrb83eM=; b=iQRFLI8z9+aHSUAgDB0kBragh2wVk/7sHwLmPrRBZO43UHiX1V/6M0AMy7vrkxcrdg K+iQIFtaKkbGj3uhWVuqVYJWrjKqJaFjN33RsFtvOwIBo0+ptibWLv3o5z+Arh4GdzcB QgLNuX9XkEDNNeRcrLs9ZzpdV5xGkN5+gxICpUn0m13Zfx3sY7+A6BkrmGS0HPaziWLY jA1eyiE54ledZbFf6JrQ97IciWJWTQQMn1Fr4i5B7GGwVrn0riJRijgttxdnzr3xpJmy dOI10Jy/LnGN3uS2Wo/UlEeyJft1U0iN6l2+iZ8vJ+7EjOcmk07wrO2Wql6Mzi1qn/WT 3zUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=muNwPqj3IkfhKZEjpn4BdoeLJPrHYLRIQ5wRRrb83eM=; b=qCSyOxVQqun53VT3Hsef+AjAoz6qicdyZjwCnRFIHa85hgCW3hG8KA4vZq/p2hGlw6 cyxRSquz6Y/I4bNjbxr5sPx7UJ2Xu/dFyeJJ4PfT5jWkzJ6rz4ssRXaGXTc2NAIuTooj Calcp9pAsH8RdZGE0L4f80tkZPCeQu0Sfq9i7AK79A1JgnYxTyQigt05Gu2sfi7B8bzz XeANd2f4j8Th9LDQpGF8o4QPAXTF8NEvdr+/dud86GtDvbWQUDqjCXP4qwCYePzu2RRw FumejD85US6ZYMdxwiW24eVcsSzfWg1I9G2+RukIsiFHJzriHLkZW0ULbzlSu8gbe472 YN3w== X-Gm-Message-State: AOAM532W4TVl5BjJPFkiVqJYPSsTtusq+zCAWNjFZstfGqBWr9gwOFsn ZrY0CYPxXASFq1rUXV7a2vtMLvrQ+Ss45QPWpc1usA== X-Google-Smtp-Source: ABdhPJz3jBFaSrs5DktJXtA4EyUYw3yXxMJ24F+1drEkngz60VRBnhVnVmaZXnjQkLyg1PI4+Xati1D2lk8MIjYKf24= X-Received: by 2002:a63:1f47:: with SMTP id q7mr11604193pgm.10.1611184550928; Wed, 20 Jan 2021 15:15:50 -0800 (PST) MIME-Version: 1.0 References: <20210119131724.308884-1-adrian.ratiu@collabora.com> <20210119131724.308884-2-adrian.ratiu@collabora.com> In-Reply-To: From: Nick Desaulniers Date: Wed, 20 Jan 2021 15:15:40 -0800 Message-ID: Subject: Re: [PATCH v4 1/2] arm: lib: xor-neon: remove unnecessary GCC < 4.6 warning To: Arnd Bergmann , Ard Biesheuvel X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210120_181554_332030_83D2BD68 X-CRM114-Status: GOOD ( 31.32 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arnd Bergmann , Adrian Ratiu , Russell King , Linux Kernel Mailing List , clang-built-linux , Arvind Sankar , Nathan Chancellor , Collabora Kernel ML , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, Jan 20, 2021 at 3:09 PM Nick Desaulniers wrote: > > On Tue, Jan 19, 2021 at 1:35 PM Arnd Bergmann wrote: > > > > On Tue, Jan 19, 2021 at 10:18 PM 'Nick Desaulniers' via Clang Built > > Linux wrote: > > > > > > On Tue, Jan 19, 2021 at 5:17 AM Adrian Ratiu wrote: > > > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > > > > index b99dd8e1c93f..f9f3601cc2d1 100644 > > > > --- a/arch/arm/lib/xor-neon.c > > > > +++ b/arch/arm/lib/xor-neon.c > > > > @@ -14,20 +14,22 @@ MODULE_LICENSE("GPL"); > > > > #error You should compile this file with '-march=armv7-a -mfloat-abi=softfp -mfpu=neon' > > > > #endif > > > > > > > > +/* > > > > + * TODO: Even though -ftree-vectorize is enabled by default in Clang, the > > > > + * compiler does not produce vectorized code due to its cost model. > > > > + * See: https://github.com/ClangBuiltLinux/linux/issues/503 > > > > + */ > > > > +#ifdef CONFIG_CC_IS_CLANG > > > > +#warning Clang does not vectorize code in this file. > > > > +#endif > > > > > > Arnd, remind me again why it's a bug that the compiler's cost model > > > says it's faster to not produce a vectorized version of these loops? > > > I stand by my previous comment: https://bugs.llvm.org/show_bug.cgi?id=40976#c8 > > > > The point is that without vectorizing the code, there is no point in building > > both the default xor code and a "neon" version that has to save/restore > > the neon registers but doesn't actually use them. > > Doesn't that already happen today with GCC when the pointer arguments > are overlapping? The loop is "versioned" and may not actually use the > NEON registers at all at runtime for such arguments. > https://godbolt.org/z/q48q8v See also: > https://bugs.llvm.org/show_bug.cgi?id=40976#c11. Or am I missing > something? > > So I'm thinking if we extend out this pattern to the rest of the > functions, we can actually avoid calls to > kernel_neon_begin()/kernel_neon_end() for cases in which pointers > would be too close to use the vectorized loop version; meaning for GCC > this would be an optimization (don't save neon registers when you're > not going to use them). I would probably consider moving > include/asm-generic/xor.h somewhere under arch/arm/ > perhaps...err...something for the other users of . > > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h > index aefddec79286..abd748d317e8 100644 > --- a/arch/arm/include/asm/xor.h > +++ b/arch/arm/include/asm/xor.h > @@ -148,12 +148,12 @@ extern struct xor_block_template const > xor_block_neon_inner; > static void > xor_neon_2(unsigned long bytes, unsigned long *p1, unsigned long *p2) > { > - if (in_interrupt()) { > - xor_arm4regs_2(bytes, p1, p2); > - } else { > + if (!in_interrupt() && abs((uintptr_t)p1, (uintptr_t)p2) >= 8) { > kernel_neon_begin(); > xor_block_neon_inner.do_2(bytes, p1, p2); > kernel_neon_end(); > + } else { > + xor_arm4regs_2(bytes, p1, p2); > } > } > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > index b99dd8e1c93f..0e8e474c0523 100644 > --- a/arch/arm/lib/xor-neon.c > +++ b/arch/arm/lib/xor-neon.c > @@ -14,22 +14,6 @@ MODULE_LICENSE("GPL"); > #error You should compile this file with '-march=armv7-a > -mfloat-abi=softfp -mfpu=neon' > #endif > > -/* > - * Pull in the reference implementations while instructing GCC (through > - * -ftree-vectorize) to attempt to exploit implicit parallelism and emit > - * NEON instructions. > - */ > -#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6) > -#pragma GCC optimize "tree-vectorize" Err...we need to keep this but use the -f flag with __restrict for GCC to vectorize: https://godbolt.org/z/9acnEv -- Thanks, ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel