From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D75DC388F7 for ; Tue, 10 Nov 2020 21:42:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C29692065E for ; Tue, 10 Nov 2020 21:42:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="LSBoX11D" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731876AbgKJVmi (ORCPT ); Tue, 10 Nov 2020 16:42:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731795AbgKJVl3 (ORCPT ); Tue, 10 Nov 2020 16:41:29 -0500 Received: from mail-pf1-x444.google.com (mail-pf1-x444.google.com [IPv6:2607:f8b0:4864:20::444]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50474C0613D1 for ; Tue, 10 Nov 2020 13:41:29 -0800 (PST) Received: by mail-pf1-x444.google.com with SMTP id q10so100032pfn.0 for ; Tue, 10 Nov 2020 13:41:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Q4CcE7u3K9j+955QTtLBYw45S3stN7Sf2ld0QlffWLY=; b=LSBoX11DTqZt5ykBJjWrLVCotXOX+VDMeduVWfHpirxS6g4oiPNSeATzedTwsIFC2/ y2EiePGA6mZctBwYNOw9YBZyPvbjrM+Kal/pK5eJ/BTBt3f4aSRgKRIV0tbVpgvN0HRz NOUOwLspZVsqHeOvF9MVZT3JrrVWoKy1jKR0OON9LuYe3ytE1v892Bv4zDNzD4qEqOCm 8Gt5pOJl+3BfB+TgugsKLaSucdby/7i1UVwOePIt1Y3uUL7NaZh/+mTSQK58DPCoxLkJ 4de0ZEnHcGvWkx+fQmKhQtscNRTBlctPSh7CiW7OzHJOav/Ujzz6EPSYLs7tFC3HtfDT MxLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Q4CcE7u3K9j+955QTtLBYw45S3stN7Sf2ld0QlffWLY=; b=XSC8VYttoMT9/m2U8sziabPyIuMp/9T1PELmHhHds26JyT9zWznDXWqW8iRIoVI+/6 cO+Opd/yL86f0DhVy3/MTYR2Sm2CcQuGlyIYDpIUZMr4zOQlXWw+GL2u1crqFYK5Rzjl sDi77rcOLDA3mDJINSJF1KZZYQQO7d/XHAz8JJ3Y0slcqMR3dIzSzATdSWrnDYJbhVnM bFNtQAg1oJuwo/yYcGPWE47YDMVPqzkpK5GX8y7n9X3uCIAgridOj+xZCzFyMqGU76Fm goTKsNwY4q3Pyek8J/boHSA18j9s5vayVmAYYEElafD55V/8SE1xGN/QJ5ZeCR302pLK o2NQ== X-Gm-Message-State: AOAM533ktoCUVKmIf+UxwuJunzjtg5HmhAPEsnk+gZemtdxOQ0/2c+Cz 6DIizLXZGzCJsFGoWsxPxfUsTJCijTo8cNdVSXB/rg== X-Google-Smtp-Source: ABdhPJyGvE2Z8lE+zLIDsR+HvD7B4XIlvgvIgM5wCqeFvC2lqKZgL08J6WkYKLGhxZCDLEkSf+E3yJ1CbTFyKmU15bk= X-Received: by 2002:a17:90a:4881:: with SMTP id b1mr194980pjh.32.1605044488647; Tue, 10 Nov 2020 13:41:28 -0800 (PST) MIME-Version: 1.0 References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201106101419.GB3811063@ubuntu-m3-large-x86> <87wnyyvh56.fsf@collabora.com> <871rh2i9xg.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> In-Reply-To: <871rh2i9xg.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> From: Nick Desaulniers Date: Tue, 10 Nov 2020 13:41:17 -0800 Message-ID: Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization To: Adrian Ratiu Cc: Nathan Chancellor , Arnd Bergmann , Linux ARM , clang-built-linux , Russell King , LKML , Collabora Kernel ML , Ard Biesheuvel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Nov 9, 2020 at 11:51 AM Adrian Ratiu wrote: > > On Fri, 06 Nov 2020, Nick Desaulniers > wrote: > > +#pragma clang loop vectorize(enable) > > do { > > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= > > p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; > > ``` seems to generate the vectorized code. > > > > Why don't we find a way to make those pragma's more toolchain > > portable, rather than open coding them like I have above rather > > than this series? > > Hi again Nick, > > How did you verify the above pragmas generate correct vectorized > code? Have you tested this specific use case? I read the disassembly before and after my suggested use of pragmas; look for vld/vstr. You can also add -Rpass-missed=loop-vectorize to CFLAGS_xor-neon.o in arch/arm/lib/Makefile and rebuild arch/arm/lib/xor-neon.o with CONFIG_BTRFS enabled. > > I'm asking because overrulling the cost model might not be enough, > the only thing I can confirm is that the generated code is > changed, but not that it is correct in any way. The object disasm > also looks weird, but I don't have enough knowledge to start > debugging what's happening within LLVM/Clang itself. It doesn't "look weird" to me. The loop is versioned based on a comparison whether the parameters alias or not. There's a non-vectorized version if the parameters are equal or close enough to overlap. There's another version of the loop that's vectorized. If you want just the vectorized version, then you have to mark the parameters as __restrict qualified, then check that all callers are ok with that. > > I also get some new warnings with your code [1], besides the > previously 'vectorization was possible but not beneficial' which > is still present. It is quite funny because these two warnings > seem to contradict themselves. :) >From which compiler? ``` $ clang -Wpass-failed=transform-warning -c -x c /dev/null warning: unknown warning option '-Wpass-failed=transform-warning'; did you mean '-Wprofile-instr-missing'? [-Wunknown-warning-option] ``` The pragma is clang specific, hence my recommendation to wrap it in an #ifdef __clang__. > > At this point I do not trust the compiler and am inclined to do Nonsense. > like was done for GCC when it was broken: disable the optimization > and warn users to upgrade after the compiler is fixed and > confirmed to work. > > If you agree I can send a v2 with this and also drop the GCC > pragma as Arvind and Ard suggested. If you resend "this" as in 2/2, I will NACK it. There's nothing wrong with the cost model; it's saying there's little point in generating the vectorized version because you're still going to need a non-vectorized loop version anyways. Claiming there is a compiler bug here is dubious just because the cost models between two compilers differ slightly. Resend the patch removing the warning, remove the GCC pragma, but if you want to change anything here for Clang, use `#pragma clang loop vectorize(enable)` wrapped in an `#ifdef __clang__`. > > Kind regards, > Adrian > > [1] > ./include/asm-generic/xor.h:11:1: warning: loop not vectorized: > the optimizer was unable to perform the requested transformation; > the transformation might be disabled or specified as part of an > unsupported transformation ordering > [-Wpass-failed=transform-warning] xor_8regs_2(unsigned long bytes, > unsigned long *p1, unsigned long *p2) -- Thanks, ~Nick Desaulniers From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92046C388F7 for ; Tue, 10 Nov 2020 21:42:59 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2DD6D2065E for ; Tue, 10 Nov 2020 21:42:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="fUAP5zjb"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="LSBoX11D" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2DD6D2065E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=O/4PJZAK6sk+ImKxNOjFiRHwmW9etC6Y6jMYA2CxTvE=; b=fUAP5zjbST427QhLc+4fd0oMf g6S1z7RWzG3cuL9b8nFOzNhfsgQ4KAK4EtSVZAnA8VpJqVfhQU78onAcuHHqkJSst/+qJdNTQV9y3 5R7U2sY4WfWgDpHrqC/lSFGfcldWzJmZt7alH94Hos8iGfqoe9ps3MkG70ZhbDnjInZkuhX9Yji4+ 7Mrj7g6amN0JsgpmtLcaqM2WjrAck9KrbxQ58Z2GIyPi27/norEuhWTPl5G7vhXFle0Ck+lPvA6yj TZwbHn/J9IY3E2vRL8SRdl+4qf+8TVdHvyotq89hJQNY4McjzN3R8NSkyv6O95Lv6B/4ici+JIGE6 xyVrX0IQg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kcbOP-0001yl-Qo; Tue, 10 Nov 2020 21:41:33 +0000 Received: from mail-pg1-x543.google.com ([2607:f8b0:4864:20::543]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kcbON-0001xy-6M for linux-arm-kernel@lists.infradead.org; Tue, 10 Nov 2020 21:41:32 +0000 Received: by mail-pg1-x543.google.com with SMTP id w4so11388494pgg.13 for ; Tue, 10 Nov 2020 13:41:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Q4CcE7u3K9j+955QTtLBYw45S3stN7Sf2ld0QlffWLY=; b=LSBoX11DTqZt5ykBJjWrLVCotXOX+VDMeduVWfHpirxS6g4oiPNSeATzedTwsIFC2/ y2EiePGA6mZctBwYNOw9YBZyPvbjrM+Kal/pK5eJ/BTBt3f4aSRgKRIV0tbVpgvN0HRz NOUOwLspZVsqHeOvF9MVZT3JrrVWoKy1jKR0OON9LuYe3ytE1v892Bv4zDNzD4qEqOCm 8Gt5pOJl+3BfB+TgugsKLaSucdby/7i1UVwOePIt1Y3uUL7NaZh/+mTSQK58DPCoxLkJ 4de0ZEnHcGvWkx+fQmKhQtscNRTBlctPSh7CiW7OzHJOav/Ujzz6EPSYLs7tFC3HtfDT MxLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Q4CcE7u3K9j+955QTtLBYw45S3stN7Sf2ld0QlffWLY=; b=AMpGaaAK2DHjFFgB4q32cVdZAbCr2Bs4wJmDAwHh2/A/nuWwrjWYYOWXr/oklpoMJ7 11O59XStcO38/qbJT+Y1ELS+DYcxVma4GU2U/OW0jdtQS+a6bfNFelQSYr7mYuuEUBwd 8LuwDHYrvUho7S5t1kqLuQ7Mt/LE88j8gBQwiTLbj8DTMcf3+yHdUCM0s/NkzYVp6OYp oD5zMBdoiUO2Ea75N10nu1pEpUqxXWsA0/qR6/4VcQrSv+7WEA/eFY0U65egSsit+noW 6FngUa/jtL8Dgeuu7lux2a4ah4wxaXj2ufEHodHruKSmFeHt7/fv4dywGuwsTzsIJ+Ea Kkqw== X-Gm-Message-State: AOAM531sWDNDscLhJXn9AweeKhP1DxJ91jZDrq115+0Bh3LLNG47bpte C+CPRYvFLlmFgKawv5EUb/Y2oryIICByE6zBTzhSNQ== X-Google-Smtp-Source: ABdhPJyGvE2Z8lE+zLIDsR+HvD7B4XIlvgvIgM5wCqeFvC2lqKZgL08J6WkYKLGhxZCDLEkSf+E3yJ1CbTFyKmU15bk= X-Received: by 2002:a17:90a:4881:: with SMTP id b1mr194980pjh.32.1605044488647; Tue, 10 Nov 2020 13:41:28 -0800 (PST) MIME-Version: 1.0 References: <20201106051436.2384842-1-adrian.ratiu@collabora.com> <20201106051436.2384842-3-adrian.ratiu@collabora.com> <20201106101419.GB3811063@ubuntu-m3-large-x86> <87wnyyvh56.fsf@collabora.com> <871rh2i9xg.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> In-Reply-To: <871rh2i9xg.fsf@iwork.i-did-not-set--mail-host-address--so-tickle-me> From: Nick Desaulniers Date: Tue, 10 Nov 2020 13:41:17 -0800 Message-ID: Subject: Re: [PATCH 2/2] arm: lib: xor-neon: disable clang vectorization To: Adrian Ratiu X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201110_164131_299286_C5D04834 X-CRM114-Status: GOOD ( 28.07 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Arnd Bergmann , LKML , Russell King , clang-built-linux , Nathan Chancellor , Collabora Kernel ML , Ard Biesheuvel , Linux ARM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, Nov 9, 2020 at 11:51 AM Adrian Ratiu wrote: > > On Fri, 06 Nov 2020, Nick Desaulniers > wrote: > > +#pragma clang loop vectorize(enable) > > do { > > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= > > p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; > > ``` seems to generate the vectorized code. > > > > Why don't we find a way to make those pragma's more toolchain > > portable, rather than open coding them like I have above rather > > than this series? > > Hi again Nick, > > How did you verify the above pragmas generate correct vectorized > code? Have you tested this specific use case? I read the disassembly before and after my suggested use of pragmas; look for vld/vstr. You can also add -Rpass-missed=loop-vectorize to CFLAGS_xor-neon.o in arch/arm/lib/Makefile and rebuild arch/arm/lib/xor-neon.o with CONFIG_BTRFS enabled. > > I'm asking because overrulling the cost model might not be enough, > the only thing I can confirm is that the generated code is > changed, but not that it is correct in any way. The object disasm > also looks weird, but I don't have enough knowledge to start > debugging what's happening within LLVM/Clang itself. It doesn't "look weird" to me. The loop is versioned based on a comparison whether the parameters alias or not. There's a non-vectorized version if the parameters are equal or close enough to overlap. There's another version of the loop that's vectorized. If you want just the vectorized version, then you have to mark the parameters as __restrict qualified, then check that all callers are ok with that. > > I also get some new warnings with your code [1], besides the > previously 'vectorization was possible but not beneficial' which > is still present. It is quite funny because these two warnings > seem to contradict themselves. :) >From which compiler? ``` $ clang -Wpass-failed=transform-warning -c -x c /dev/null warning: unknown warning option '-Wpass-failed=transform-warning'; did you mean '-Wprofile-instr-missing'? [-Wunknown-warning-option] ``` The pragma is clang specific, hence my recommendation to wrap it in an #ifdef __clang__. > > At this point I do not trust the compiler and am inclined to do Nonsense. > like was done for GCC when it was broken: disable the optimization > and warn users to upgrade after the compiler is fixed and > confirmed to work. > > If you agree I can send a v2 with this and also drop the GCC > pragma as Arvind and Ard suggested. If you resend "this" as in 2/2, I will NACK it. There's nothing wrong with the cost model; it's saying there's little point in generating the vectorized version because you're still going to need a non-vectorized loop version anyways. Claiming there is a compiler bug here is dubious just because the cost models between two compilers differ slightly. Resend the patch removing the warning, remove the GCC pragma, but if you want to change anything here for Clang, use `#pragma clang loop vectorize(enable)` wrapped in an `#ifdef __clang__`. > > Kind regards, > Adrian > > [1] > ./include/asm-generic/xor.h:11:1: warning: loop not vectorized: > the optimizer was unable to perform the requested transformation; > the transformation might be disabled or specified as part of an > unsupported transformation ordering > [-Wpass-failed=transform-warning] xor_8regs_2(unsigned long bytes, > unsigned long *p1, unsigned long *p2) -- Thanks, ~Nick Desaulniers _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel