All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/csum: fix initial seed for odd buffers
@ 2021-11-25 14:18 Eric Dumazet
  2021-11-30 22:16 ` Dave Hansen
  2021-12-01  0:33 ` [tip: x86/core] x86/csum: Fix " tip-bot2 for Eric Dumazet
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2021-11-25 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Eric Dumazet, Eric Dumazet, Noah Goldstein,
	Alexander Duyck

From: Eric Dumazet <edumazet@google.com>

When I folded do_csum() into csum_partial(), I missed that we
had to swap odd/even bytes from @sum argument.

This is because this swap will happen again at the end of the function.

[A, B, C, D] -> [B, A, D, C]

As far as Internet checksums (rfc 1071) are concerned, we can instead
rotate the whole 32bit value by 8 (or 24)

-> [D, A, B, C]

Note that I played with the idea of replacing this final swaping:

    result = from32to16(result);
    result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);

With:

    result = ror32(result, 8);

But while the generated code was definitely better for the odd case,
run time cost for the more likely even case was not better for gcc.

gcc is replacing a well predicted conditional branch
with a cmov instruction after a ror instruction which adds
a cost canceling the cmov gain.

Many thanks to Noah Goldstein for reporting this issue.

Fixes: df4554cebdaa ("x86/csum: Rewrite/optimize csum_partial()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Noah Goldstein <goldstein.w.n@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Duyck <alexanderduyck@fb.com>
---
 arch/x86/lib/csum-partial_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 1eb8f2d11f7c785be624eba315fe9ca7989fd56d..40b527ba1da1f74b5dbc51ddd97a9ecad22ee5ae 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 	if (unlikely(odd)) {
 		if (unlikely(len == 0))
 			return sum;
+		temp64 = ror32((__force u32)sum, 8);
 		temp64 += (*(unsigned char *)buff << 8);
 		len--;
 		buff++;
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/csum: fix initial seed for odd buffers
  2021-11-25 14:18 [PATCH] x86/csum: fix initial seed for odd buffers Eric Dumazet
@ 2021-11-30 22:16 ` Dave Hansen
  2021-12-01  0:18   ` Eric Dumazet
  2021-12-01  0:33 ` [tip: x86/core] x86/csum: Fix " tip-bot2 for Eric Dumazet
  1 sibling, 1 reply; 4+ messages in thread
From: Dave Hansen @ 2021-11-30 22:16 UTC (permalink / raw)
  To: Eric Dumazet, Peter Zijlstra
  Cc: linux-kernel, Eric Dumazet, Noah Goldstein, Alexander Duyck

On 11/25/21 6:18 AM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> When I folded do_csum() into csum_partial(), I missed that we
> had to swap odd/even bytes from @sum argument.
...
> Fixes: df4554cebdaa ("x86/csum: Rewrite/optimize csum_partial()")

Hi Eric,

That Fixes: sha1 doesn't match anything that I can find.  Should it be
Fixes: this commit:

> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/core&id=d31c3c683ee668ba5d87c0730610442fd672525f

I'm happy to fix it up when I apply it.  Just wanted to double-check
what you intended.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] x86/csum: fix initial seed for odd buffers
  2021-11-30 22:16 ` Dave Hansen
@ 2021-12-01  0:18   ` Eric Dumazet
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2021-12-01  0:18 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Eric Dumazet, Peter Zijlstra, linux-kernel, Noah Goldstein,
	Alexander Duyck

On Tue, Nov 30, 2021 at 2:16 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 11/25/21 6:18 AM, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > When I folded do_csum() into csum_partial(), I missed that we
> > had to swap odd/even bytes from @sum argument.
> ...
> > Fixes: df4554cebdaa ("x86/csum: Rewrite/optimize csum_partial()")
>
> Hi Eric,
>
> That Fixes: sha1 doesn't match anything that I can find.  Should it be
> Fixes: this commit:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/core&id=d31c3c683ee668ba5d87c0730610442fd672525f
>
> I'm happy to fix it up when I apply it.  Just wanted to double-check
> what you intended.

II think you are right, the sha1 I gave was from my local tree, I
forgot to rebase, sorry for the confusion.
Thanks !

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip: x86/core] x86/csum: Fix initial seed for odd buffers
  2021-11-25 14:18 [PATCH] x86/csum: fix initial seed for odd buffers Eric Dumazet
  2021-11-30 22:16 ` Dave Hansen
@ 2021-12-01  0:33 ` tip-bot2 for Eric Dumazet
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot2 for Eric Dumazet @ 2021-12-01  0:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel),
	Noah Goldstein, Eric Dumazet, Dave Hansen, x86, linux-kernel

The following commit has been merged into the x86/core branch of tip:

Commit-ID:     2a144bcd661c4f0a503e03f9280e88854ac0bb37
Gitweb:        https://git.kernel.org/tip/2a144bcd661c4f0a503e03f9280e88854ac0bb37
Author:        Eric Dumazet <edumazet@google.com>
AuthorDate:    Thu, 25 Nov 2021 06:18:17 -08:00
Committer:     Dave Hansen <dave.hansen@linux.intel.com>
CommitterDate: Tue, 30 Nov 2021 16:26:03 -08:00

x86/csum: Fix initial seed for odd buffers

When I folded do_csum() into csum_partial(), I missed that we
had to swap odd/even bytes from @sum argument.

This is because this swap will happen again at the end of the function.

[A, B, C, D] -> [B, A, D, C]

As far as Internet checksums (rfc 1071) are concerned, we can instead
rotate the whole 32bit value by 8 (or 24)

-> [D, A, B, C]

Note that I played with the idea of replacing this final swapping:

    result = from32to16(result);
    result = ((result >> 8) & 0xff) | ((result & 0xff) << 8);

With:

    result = ror32(result, 8);

But while the generated code was definitely better for the odd case,
run time cost for the more likely even case was not better for gcc.

gcc is replacing a well predicted conditional branch
with a cmov instruction after a ror instruction which adds
a cost canceling the cmov gain.

Many thanks to Noah Goldstein for reporting this issue.

[ dhansen: * spelling: swaping => swapping
	   * updated Fixes commit  ]

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Fixes: d31c3c683ee6 ("x86/csum: Rewrite/optimize csum_partial()")
Reported-by: Noah Goldstein <goldstein.w.n@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20211125141817.3541501-1-eric.dumazet@gmail.com
---
 arch/x86/lib/csum-partial_64.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c
index 1eb8f2d..40b527b 100644
--- a/arch/x86/lib/csum-partial_64.c
+++ b/arch/x86/lib/csum-partial_64.c
@@ -41,6 +41,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum)
 	if (unlikely(odd)) {
 		if (unlikely(len == 0))
 			return sum;
+		temp64 = ror32((__force u32)sum, 8);
 		temp64 += (*(unsigned char *)buff << 8);
 		len--;
 		buff++;

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-01  0:33 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-25 14:18 [PATCH] x86/csum: fix initial seed for odd buffers Eric Dumazet
2021-11-30 22:16 ` Dave Hansen
2021-12-01  0:18   ` Eric Dumazet
2021-12-01  0:33 ` [tip: x86/core] x86/csum: Fix " tip-bot2 for Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.