linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/csum: fix initial seed for odd buffers
@ 2021-11-25 14:18 Eric Dumazet
  2021-11-30 22:16 ` Dave Hansen
  2021-12-01  0:33 ` [tip: x86/core] x86/csum: Fix " tip-bot2 for Eric Dumazet
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2021-11-25 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Eric Dumazet, Eric Dumazet, Noah Goldstein,
	Alexander Duyck

From: Eric Dumazet <edumazet@google.com>

When I folded do_csum() into csum_partial(), I missed that we
had to swap odd/even bytes from @sum argument.

This is because this swap will happen again at the end of the function.

[A, B, C, D] -> [B, A, D, C]

As far as Internet checksums (rfc 1071) are concerned, we can instead
rotate the whole 32bit value by 8 (or 24)

-> [D, A, B, C]

Note that I played with the idea of replacing this final swaping:

    result = from32to16(result);
    result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);

With:

    result = ror32(result, 8);

But while the generated code was definitely better for the odd case,
run time cost for the more likely even case was not better for gcc.

gcc is replacing a well predicted conditional branch
with a cmov instruction after a ror instruction which adds
a cost canceling the cmov gain.

Many thanks to Noah Goldstein for reporting this issue.

Fixes: df4554cebdaa ("x86/csum: Rewrite/optimize csum_partial()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Duyck <alexanderduyck@fb.com>
---
 arch/x86/lib/csum-partial_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..40b527ba1da1f74b5dbc51ddd97a9ecad22ee5ae 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 	if (unlikely(odd)) {
 		if (unlikely(len == 0))
 			return sum;
+		temp64 = ror32((__force u32)sum, 8);
 		temp64 += (*(unsigned char *)buff << 8);
 		len--;
 		buff++;
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/csum: fix initial seed for odd buffers
  2021-11-25 14:18 [PATCH] x86/csum: fix initial seed for odd buffers Eric Dumazet
@ 2021-11-30 22:16 ` Dave Hansen
  2021-12-01  0:18   ` Eric Dumazet
  2021-12-01  0:33 ` [tip: x86/core] x86/csum: Fix " tip-bot2 for Eric Dumazet
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2021-11-30 22:16 UTC (permalink / raw)
  To: Eric Dumazet, Peter Zijlstra
  Cc: linux-kernel, Eric Dumazet, Noah Goldstein, Alexander Duyck

On 11/25/21 6:18 AM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> When I folded do_csum() into csum_partial(), I missed that we
> had to swap odd/even bytes from @sum argument.
...
> Fixes: df4554cebdaa ("x86/csum: Rewrite/optimize csum_partial()")

Hi Eric,

That Fixes: sha1 doesn't match anything that I can find.  Should it be
Fixes: this commit:

> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/core&id=d31c3c683ee668ba5d87c0730610442fd672525f

I'm happy to fix it up when I apply it.  Just wanted to double-check
what you intended.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/csum: fix initial seed for odd buffers
  2021-11-30 22:16 ` Dave Hansen
@ 2021-12-01  0:18   ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2021-12-01  0:18 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Eric Dumazet, Peter Zijlstra, linux-kernel, Noah Goldstein,
	Alexander Duyck

On Tue, Nov 30, 2021 at 2:16 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 11/25/21 6:18 AM, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > When I folded do_csum() into csum_partial(), I missed that we
> > had to swap odd/even bytes from @sum argument.
> ...
> > Fixes: df4554cebdaa ("x86/csum: Rewrite/optimize csum_partial()")
>
> Hi Eric,
>
> That Fixes: sha1 doesn't match anything that I can find.  Should it be
> Fixes: this commit:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/core&id=d31c3c683ee668ba5d87c0730610442fd672525f
>
> I'm happy to fix it up when I apply it.  Just wanted to double-check
> what you intended.

II think you are right, the sha1 I gave was from my local tree, I
forgot to rebase, sorry for the confusion.
Thanks !

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip: x86/core] x86/csum: Fix initial seed for odd buffers
  2021-11-25 14:18 [PATCH] x86/csum: fix initial seed for odd buffers Eric Dumazet
  2021-11-30 22:16 ` Dave Hansen
@ 2021-12-01  0:33 ` tip-bot2 for Eric Dumazet
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot2 for Eric Dumazet @ 2021-12-01  0:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Noah Goldstein, Eric Dumazet, Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     2a144bcd661c4f0a503e03f9280e88854ac0bb37
Gitweb:        https://git.kernel.org/tip/2a144bcd661c4f0a503e03f9280e88854ac0bb37
Author:        Eric Dumazet <edumazet@google.com>
AuthorDate:    Thu, 25 Nov 2021 06:18:17 -08:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Tue, 30 Nov 2021 16:26:03 -08:00

x86/csum: Fix initial seed for odd buffers

When I folded do_csum() into csum_partial(), I missed that we
had to swap odd/even bytes from @sum argument.

This is because this swap will happen again at the end of the function.

[A, B, C, D] -> [B, A, D, C]

As far as Internet checksums (rfc 1071) are concerned, we can instead
rotate the whole 32bit value by 8 (or 24)

-> [D, A, B, C]

Note that I played with the idea of replacing this final swapping:

    result = from32to16(result);
    result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);

With:

    result = ror32(result, 8);

But while the generated code was definitely better for the odd case,
run time cost for the more likely even case was not better for gcc.

gcc is replacing a well predicted conditional branch
with a cmov instruction after a ror instruction which adds
a cost canceling the cmov gain.

Many thanks to Noah Goldstein for reporting this issue.

[ dhansen: * spelling: swaping => swapping
	   * updated Fixes commit  ]

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixes: d31c3c683ee6 ("x86/csum: Rewrite/optimize csum_partial()")
Reported-by: Noah Goldstein <goldstein.w.n@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211125141817.3541501-1-eric.dumazet@gmail.com
---
 arch/x86/lib/csum-partial_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 1eb8f2d..40b527b 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 	if (unlikely(odd)) {
 		if (unlikely(len == 0))
 			return sum;
+		temp64 = ror32((__force u32)sum, 8);
 		temp64 += (*(unsigned char *)buff << 8);
 		len--;
 		buff++;

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-01  0:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-25 14:18 [PATCH] x86/csum: fix initial seed for odd buffers Eric Dumazet
2021-11-30 22:16 ` Dave Hansen
2021-12-01  0:18   ` Eric Dumazet
2021-12-01  0:33 ` [tip: x86/core] x86/csum: Fix " tip-bot2 for Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).