From: Eric Dumazet <edumazet@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Laight <David.Laight@aculab.com>,
Noah Goldstein <goldstein.w.n@gmail.com>,
kernel test robot <lkp@intel.com>,
"x86@kernel.org" <x86@kernel.org>,
"oe-kbuild-all@lists.linux.dev" <oe-kbuild-all@lists.linux.dev>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"hpa@zytor.com" <hpa@zytor.com>
Subject: Re: x86/csum: Remove unnecessary odd handling
Date: Sat, 6 Jan 2024 11:26:06 +0100 [thread overview]
Message-ID: <CANn89iKjUZjw-9ACNWrEd_H+o79Uwkw9NVuujQ3w=c2pGRFotg@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wjGaH6oA47WkphTweMiy15Zjfuk-aVcXSasMX=aX9rFLQ@mail.gmail.com>
On Sat, Jan 6, 2024 at 1:19 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, 5 Jan 2024 at 15:53, David Laight <David.Laight@aculab.com> wrote:
> >
> > I'd have to fix his benchmark code first :-)
> > You can't use the TSC unless you lock the cpu frequency.
> > The longer the test runs for the faster the cpu will run.
>
> They'll stabilize, it has soem cycle result aging code.
>
> But yes, set the CPU policy to 'performance' or do performance
> counters if you care deeply.
>
> > On a related point, do you remember what the 'killer app'
> > was for doing the checksum in copy_to/from_user?
>
> No. It's a long time ago, and many things have changed since.
>
> It's possible the copy-and-csum it's not worth it any more, simply
> because all modern network cards will do the csum for us, and I think
> loopback sets a flag saying "no need to checksum" too.
>
> But I do have a strong memory of it being a big deal back when. A
> _loong_ time ago.
>
On a related note, at least with clang, I found that csum_ipv6_magic()
is needlessly using temporary on-stack storage,
showing a stall on Cascade Lake unless I am patching add32_with_carry() :
diff --git a/arch/x86/include/asm/checksum_64.h
b/arch/x86/include/asm/checksum_64.h
index 407beebadaf45a748f91a36b78bd1d023449b132..c3d6f47626c70d81f0c2ba401d85050b09a39922
100644
--- a/arch/x86/include/asm/checksum_64.h
+++ b/arch/x86/include/asm/checksum_64.h
@@ -171,7 +171,7 @@ static inline unsigned add32_with_carry(unsigned
a, unsigned b)
asm("addl %2,%0\n\t"
"adcl $0,%0"
: "=r" (a)
- : "0" (a), "rm" (b));
+ : "0" (a), "r" (b));
return a;
}
Before patch :
ffffffff81b9b600 <csum_ipv6_magic>:
ffffffff81b9b600: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
ffffffff81b9b601: R_X86_64_NONE __fentry__-0x4
ffffffff81b9b605: 55 push %rbp
ffffffff81b9b606: 48 89 e5 mov %rsp,%rbp
ffffffff81b9b609: 48 83 ec 04 sub $0x4,%rsp // This
will be removed after patch
ffffffff81b9b60d: 0f ca bswap %edx
ffffffff81b9b60f: 89 c9 mov %ecx,%ecx
ffffffff81b9b611: 48 c1 e1 08 shl $0x8,%rcx
ffffffff81b9b615: 44 89 c0 mov %r8d,%eax
ffffffff81b9b618: 48 01 d0 add %rdx,%rax
ffffffff81b9b61b: 48 01 c8 add %rcx,%rax
ffffffff81b9b61e: 48 03 07 add (%rdi),%rax
ffffffff81b9b621: 48 13 47 08 adc 0x8(%rdi),%rax
ffffffff81b9b625: 48 13 06 adc (%rsi),%rax
ffffffff81b9b628: 48 13 46 08 adc 0x8(%rsi),%rax
ffffffff81b9b62c: 48 83 d0 00 adc $0x0,%rax
ffffffff81b9b630: 48 89 c1 mov %rax,%rcx
ffffffff81b9b633: 48 c1 e9 20 shr $0x20,%rcx
ffffffff81b9b637: 89 4d fc mov %ecx,-0x4(%rbp) //
Store exc on the stack
ffffffff81b9b63a: 03 45 fc add -0x4(%rbp),%eax //
reads the value right away, stalling some Intel cpus.
ffffffff81b9b63d: 83 d0 00 adc $0x0,%eax
ffffffff81b9b640: 89 c1 mov %eax,%ecx // Note that
ecs content was scratch anyway
ffffffff81b9b642: c1 e1 10 shl $0x10,%ecx
ffffffff81b9b645: 25 00 00 ff ff and $0xffff0000,%eax
ffffffff81b9b64a: 01 c8 add %ecx,%eax
ffffffff81b9b64c: 15 ff ff 00 00 adc $0xffff,%eax
ffffffff81b9b651: f7 d0 not %eax
ffffffff81b9b653: c1 e8 10 shr $0x10,%eax
ffffffff81b9b656: 48 83 c4 04 add $0x4,%rsp
ffffffff81b9b65a: 5d pop %rbp
ffffffff81b9b65b: c3 ret
ffffffff81b9b65c: cc int3
ffffffff81b9b65d: 00 00 add %al,(%rax)
ffffffff81b9b65f: cc int3
After patch:
00000000000000a0 <csum_ipv6_magic>:
a0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
a1: R_X86_64_NONE __fentry__-0x4
a5: 55 push %rbp
a6: 48 89 e5 mov %rsp,%rbp
a9: 0f ca bswap %edx
ab: 89 c9 mov %ecx,%ecx
ad: 48 c1 e1 08 shl $0x8,%rcx
b1: 44 89 c0 mov %r8d,%eax
b4: 48 01 d0 add %rdx,%rax
b7: 48 01 c8 add %rcx,%rax
ba: 48 03 07 add (%rdi),%rax
bd: 48 13 47 08 adc 0x8(%rdi),%rax
c1: 48 13 06 adc (%rsi),%rax
c4: 48 13 46 08 adc 0x8(%rsi),%rax
c8: 48 83 d0 00 adc $0x0,%rax
cc: 48 89 c1 mov %rax,%rcx
cf: 48 c1 e9 20 shr $0x20,%rcx
d3: 01 c8 add %ecx,%eax // just better !
d5: 83 d0 00 adc $0x0,%eax
d8: 89 c1 mov %eax,%ecx
da: c1 e1 10 shl $0x10,%ecx
dd: 25 00 00 ff ff and $0xffff0000,%eax
e2: 01 c8 add %ecx,%eax
e4: 15 ff ff 00 00 adc $0xffff,%eax
e9: f7 d0 not %eax
eb: c1 e8 10 shr $0x10,%eax
ee: 5d pop %rbp
ef: c3 ret
f0: cc int3
next prev parent reply other threads:[~2024-01-06 10:26 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20230628020657.957880-1-goldstein.w.n@gmail.com>
2023-06-28 9:12 ` x86/csum: Remove unnecessary odd handling Borislav Petkov
2023-06-28 15:32 ` Noah Goldstein
2023-06-28 17:44 ` Linus Torvalds
2023-06-28 18:34 ` Noah Goldstein
2023-06-28 20:02 ` Linus Torvalds
2023-06-29 14:04 ` David Laight
2023-06-29 14:27 ` David Laight
2023-09-01 22:21 ` Noah Goldstein
2023-09-06 13:49 ` David Laight
2023-09-06 14:38 ` David Laight
2023-09-20 19:20 ` Noah Goldstein
2023-09-20 19:23 ` Noah Goldstein
2023-09-23 3:24 ` kernel test robot
2023-09-23 14:05 ` Noah Goldstein
2023-09-23 21:13 ` David Laight
2023-09-24 14:35 ` Noah Goldstein
2023-12-23 22:18 ` Noah Goldstein
2024-01-04 23:28 ` Noah Goldstein
2024-01-04 23:34 ` Dave Hansen
2024-01-04 23:36 ` Linus Torvalds
2024-01-05 0:33 ` Linus Torvalds
2024-01-05 10:41 ` David Laight
2024-01-05 16:12 ` David Laight
2024-01-05 18:05 ` Linus Torvalds
2024-01-05 23:52 ` David Laight
2024-01-06 0:18 ` Linus Torvalds
2024-01-06 10:26 ` Eric Dumazet [this message]
2024-01-06 19:32 ` Linus Torvalds
2024-01-07 12:11 ` David Laight
2024-01-06 22:08 ` David Laight
2024-01-07 1:09 ` H. Peter Anvin
2024-01-07 11:44 ` David Laight
2023-09-24 14:35 ` Noah Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CANn89iKjUZjw-9ACNWrEd_H+o79Uwkw9NVuujQ3w=c2pGRFotg@mail.gmail.com' \
--to=edumazet@google.com \
--cc=David.Laight@aculab.com \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=goldstein.w.n@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lkp@intel.com \
--cc=mingo@redhat.com \
--cc=oe-kbuild-all@lists.linux.dev \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.