From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88777C433F5 for ; Thu, 25 Nov 2021 06:35:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344336AbhKYGiP (ORCPT ); Thu, 25 Nov 2021 01:38:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347627AbhKYGgM (ORCPT ); Thu, 25 Nov 2021 01:36:12 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A62FC0613F8 for ; Wed, 24 Nov 2021 22:32:35 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id a9so9207662wrr.8 for ; Wed, 24 Nov 2021 22:32:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=drz3+1/lxGxb1snGH+uLzFz+KStB/L2QdGu/80IF35o=; b=plpjKQNGmQVqlY2pTqUIDVpiaSI5FPXQ5h0iLTJBEB1ZFwSa8C1+zdTgsG2EaXDefI MiJ+9D7EQVINfKmMAQj/qGcm/ZIQxP2rkYctuQqN9xO9Yv4jrVVEGuepKhoKEhMwy+oi dkDpOPupfVwznnNElnNZQ729n6d2s2fwlK50e59Wh8e+E5LBq5DQXiF2VaEyLP6GR7uf k1xjoOJHcCLJw4mIJxajzbt2HM4X0bXuhzxw8CQOqQsOBYO52EPH0Hti3a1wSh10tmbR egGA4sFlc9xWGDl6oN84ro7ECjWf0Dg0EKuuqnQ1fRgZ/tJtzn+RmYF1ITaYrZS0tGuk AqfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=drz3+1/lxGxb1snGH+uLzFz+KStB/L2QdGu/80IF35o=; b=3ZVvhQ7eFHCsLk6udSWebQszL31nBnsJgBf4TD5RoT6sVUnqWZ9JE4Z7WE8y+CYKAY biFdtwy80Dvea7FJ7EazYuorlM9v8y9RpbY3AOhPo1L0rpromcepieCV2I4P7rUdUa2X qHWsKzPrWrfln25fAX+PaEoDallgC2Ec9TaFvh+GaB55k5thivnbP1Va7VM/nfws9Jq7 lPldGy71xdpZkgKe8OQeBUxwETfSa/8M6v7qmr7eQ98gjVWADUH6axigks4fWOoQmkLX UlMh4YRuf3kbPxRu28XlprqRGWKeilBuv3ecRYwBhcHNylvfFAPskKvDCsyyq0Uz43UM uG+g== X-Gm-Message-State: AOAM530hv84cyhaWV15HKkGN/jXppW2opfY/xeAwrxrVSDGJSfKzk73B 8FTx77VFimhT5ur7krK6ikIw5C6SQbzUIaKpGSZOSw== X-Google-Smtp-Source: ABdhPJwKrJcgQtDWIAChMLG1FrayYrmfAzaAwzQjhx8O2fh/5bl8UjxquXGQrZ8EUbt0TGzNdE2xCsEBeN4Ofe4yI5E= X-Received: by 2002:adf:fb86:: with SMTP id a6mr3872989wrr.35.1637821953434; Wed, 24 Nov 2021 22:32:33 -0800 (PST) MIME-Version: 1.0 References: <619eee05.1c69fb81.4b686.4bbc@mx.google.com> In-Reply-To: From: Eric Dumazet Date: Wed, 24 Nov 2021 22:32:21 -0800 Message-ID: Subject: Re: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12: error: implicit declaration of function 'load_unaligned_zeropad' To: Noah Goldstein Cc: Johannes Berg , alexanderduyck@fb.com, kbuild-all@lists.01.org, open list , linux-um@lists.infradead.org, lkp@intel.com, peterz@infradead.org, X86 ML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 24, 2021 at 9:09 PM Noah Goldstein wrote: > > > > Although I see slightly worse performance with aligned `buff` in > the branch-free approach. Imagine if non-aligned `buff` is that > uncommon might be better to speculate past the work of `ror`. Yes, no clear win here removing the conditional (same cost really), although using a ror32() is removing the from32to16() helper and get rid of one folding. I will formally submit this change, thanks ! diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..cf4bd3ef66e56c681b3435d43011ece78438376d 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -11,16 +11,6 @@ #include #include -static inline unsigned short from32to16(unsigned a) -{ - unsigned short b = a >> 16; - asm("addw %w2,%w0\n\t" - "adcw $0,%w0\n" - : "=r" (b) - : "0" (b), "r" (a)); - return b; -} - /* * Do a checksum on an arbitrary memory area. * Returns a 32bit checksum. @@ -41,6 +31,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) if (unlikely(odd)) { if (unlikely(len == 0)) return sum; + temp64 = ror32((__force u32)sum, 8); temp64 += (*(unsigned char *)buff << 8); len--; buff++; @@ -129,10 +120,8 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) #endif } result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff); - if (unlikely(odd)) { - result = from32to16(result); - result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); - } + if (unlikely(odd)) + result = ror32(result, 8); return (__force __wsum)result; } EXPORT_SYMBOL(csum_partial); From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============8980152206811054274==" MIME-Version: 1.0 From: Eric Dumazet To: kbuild-all@lists.01.org Subject: Re: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12: error: implicit declaration of function 'load_unaligned_zeropad' Date: Wed, 24 Nov 2021 22:32:21 -0800 Message-ID: In-Reply-To: List-Id: --===============8980152206811054274== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, Nov 24, 2021 at 9:09 PM Noah Goldstein = wrote: > > > > Although I see slightly worse performance with aligned `buff` in > the branch-free approach. Imagine if non-aligned `buff` is that > uncommon might be better to speculate past the work of `ror`. Yes, no clear win here removing the conditional (same cost really), although using a ror32() is removing the from32to16() helper and get rid of one folding. I will formally submit this change, thanks ! diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..cf4bd3ef66e56c681b3435d4301= 1ece78438376d 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -11,16 +11,6 @@ #include #include -static inline unsigned short from32to16(unsigned a) -{ - unsigned short b =3D a >> 16; - asm("addw %w2,%w0\n\t" - "adcw $0,%w0\n" - : "=3Dr" (b) - : "0" (b), "r" (a)); - return b; -} - /* * Do a checksum on an arbitrary memory area. * Returns a 32bit checksum. @@ -41,6 +31,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) if (unlikely(odd)) { if (unlikely(len =3D=3D 0)) return sum; + temp64 =3D ror32((__force u32)sum, 8); temp64 +=3D (*(unsigned char *)buff << 8); len--; buff++; @@ -129,10 +120,8 @@ __wsum csum_partial(const void *buff, int len, __wsum = sum) #endif } result =3D add32_with_carry(temp64 >> 32, temp64 & 0xffffffff); - if (unlikely(odd)) { - result =3D from32to16(result); - result =3D ((result >> 8) & 0xff) | ((result & 0xff) << 8); - } + if (unlikely(odd)) + result =3D ror32(result, 8); return (__force __wsum)result; } EXPORT_SYMBOL(csum_partial); --===============8980152206811054274==-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wr1-x432.google.com ([2a00:1450:4864:20::432]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mq8JA-006VqI-Ce for linux-um@lists.infradead.org; Thu, 25 Nov 2021 06:32:37 +0000 Received: by mail-wr1-x432.google.com with SMTP id o13so9169875wrs.12 for ; Wed, 24 Nov 2021 22:32:36 -0800 (PST) MIME-Version: 1.0 References: <619eee05.1c69fb81.4b686.4bbc@mx.google.com> In-Reply-To: From: Eric Dumazet Date: Wed, 24 Nov 2021 22:32:21 -0800 Message-ID: Subject: Re: [tip:x86/core 1/1] arch/x86/um/../lib/csum-partial_64.c:98:12: error: implicit declaration of function 'load_unaligned_zeropad' List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-um" Errors-To: linux-um-bounces+geert=linux-m68k.org@lists.infradead.org To: Noah Goldstein Cc: Johannes Berg , alexanderduyck@fb.com, kbuild-all@lists.01.org, open list , linux-um@lists.infradead.org, lkp@intel.com, peterz@infradead.org, X86 ML On Wed, Nov 24, 2021 at 9:09 PM Noah Goldstein wrote: > > > > Although I see slightly worse performance with aligned `buff` in > the branch-free approach. Imagine if non-aligned `buff` is that > uncommon might be better to speculate past the work of `ror`. Yes, no clear win here removing the conditional (same cost really), although using a ror32() is removing the from32to16() helper and get rid of one folding. I will formally submit this change, thanks ! diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..cf4bd3ef66e56c681b3435d43011ece78438376d 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -11,16 +11,6 @@ #include #include -static inline unsigned short from32to16(unsigned a) -{ - unsigned short b = a >> 16; - asm("addw %w2,%w0\n\t" - "adcw $0,%w0\n" - : "=r" (b) - : "0" (b), "r" (a)); - return b; -} - /* * Do a checksum on an arbitrary memory area. * Returns a 32bit checksum. @@ -41,6 +31,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) if (unlikely(odd)) { if (unlikely(len == 0)) return sum; + temp64 = ror32((__force u32)sum, 8); temp64 += (*(unsigned char *)buff << 8); len--; buff++; @@ -129,10 +120,8 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) #endif } result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff); - if (unlikely(odd)) { - result = from32to16(result); - result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); - } + if (unlikely(odd)) + result = ror32(result, 8); return (__force __wsum)result; } EXPORT_SYMBOL(csum_partial); _______________________________________________ linux-um mailing list linux-um@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um