All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Laight <David.Laight@aculab.com>,
	Noah Goldstein <goldstein.w.n@gmail.com>,
	 kernel test robot <lkp@intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	 "oe-kbuild-all@lists.linux.dev" <oe-kbuild-all@lists.linux.dev>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	 "mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>,
	 "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"hpa@zytor.com" <hpa@zytor.com>
Subject: Re: x86/csum: Remove unnecessary odd handling
Date: Sat, 6 Jan 2024 11:26:06 +0100	[thread overview]
Message-ID: <CANn89iKjUZjw-9ACNWrEd_H+o79Uwkw9NVuujQ3w=c2pGRFotg@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wjGaH6oA47WkphTweMiy15Zjfuk-aVcXSasMX=aX9rFLQ@mail.gmail.com>

On Sat, Jan 6, 2024 at 1:19 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Fri, 5 Jan 2024 at 15:53, David Laight <David.Laight@aculab.com> wrote:
> >
> > I'd have to fix his benchmark code first :-)
> > You can't use the TSC unless you lock the cpu frequency.
> > The longer the test runs for the faster the cpu will run.
>
> They'll stabilize, it has soem cycle result aging code.
>
> But yes, set the CPU policy to 'performance' or do performance
> counters if you care deeply.
>
> > On a related point, do you remember what the 'killer app'
> > was for doing the checksum in copy_to/from_user?
>
> No. It's a long time ago, and many things have changed since.
>
> It's possible the copy-and-csum it's not worth it any more, simply
> because all modern network cards will do the csum for us, and I think
> loopback sets a flag saying "no need to checksum" too.
>
> But I do have a strong memory of it being a big deal back when. A
> _loong_ time ago.
>

On a related note, at least with clang, I found that csum_ipv6_magic()
is needlessly using temporary on-stack storage,
showing a stall on Cascade Lake unless I am patching add32_with_carry() :

diff --git a/arch/x86/include/asm/checksum_64.h
b/arch/x86/include/asm/checksum_64.h
index 407beebadaf45a748f91a36b78bd1d023449b132..c3d6f47626c70d81f0c2ba401d85050b09a39922
100644
--- a/arch/x86/include/asm/checksum_64.h
+++ b/arch/x86/include/asm/checksum_64.h
@@ -171,7 +171,7 @@ static inline unsigned add32_with_carry(unsigned
a, unsigned b)
        asm("addl %2,%0\n\t"
            "adcl $0,%0"
            : "=r" (a)
-           : "0" (a), "rm" (b));
+           : "0" (a), "r" (b));
        return a;
 }

Before patch :

ffffffff81b9b600 <csum_ipv6_magic>:
ffffffff81b9b600: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
ffffffff81b9b601: R_X86_64_NONE __fentry__-0x4
ffffffff81b9b605: 55                    push   %rbp
ffffffff81b9b606: 48 89 e5              mov    %rsp,%rbp
ffffffff81b9b609: 48 83 ec 04          sub    $0x4,%rsp     // This
will be removed after patch
ffffffff81b9b60d: 0f ca                bswap  %edx
ffffffff81b9b60f: 89 c9                mov    %ecx,%ecx
ffffffff81b9b611: 48 c1 e1 08          shl    $0x8,%rcx
ffffffff81b9b615: 44 89 c0              mov    %r8d,%eax
ffffffff81b9b618: 48 01 d0              add    %rdx,%rax
ffffffff81b9b61b: 48 01 c8              add    %rcx,%rax
ffffffff81b9b61e: 48 03 07              add    (%rdi),%rax
ffffffff81b9b621: 48 13 47 08          adc    0x8(%rdi),%rax
ffffffff81b9b625: 48 13 06              adc    (%rsi),%rax
ffffffff81b9b628: 48 13 46 08          adc    0x8(%rsi),%rax
ffffffff81b9b62c: 48 83 d0 00          adc    $0x0,%rax
ffffffff81b9b630: 48 89 c1              mov    %rax,%rcx
ffffffff81b9b633: 48 c1 e9 20          shr    $0x20,%rcx
ffffffff81b9b637: 89 4d fc              mov    %ecx,-0x4(%rbp)  //
Store exc on the stack
ffffffff81b9b63a: 03 45 fc              add    -0x4(%rbp),%eax  //
reads the value right away, stalling some Intel cpus.
ffffffff81b9b63d: 83 d0 00              adc    $0x0,%eax
ffffffff81b9b640: 89 c1                mov    %eax,%ecx   // Note that
ecs content was scratch anyway
ffffffff81b9b642: c1 e1 10              shl    $0x10,%ecx
ffffffff81b9b645: 25 00 00 ff ff        and    $0xffff0000,%eax
ffffffff81b9b64a: 01 c8                add    %ecx,%eax
ffffffff81b9b64c: 15 ff ff 00 00        adc    $0xffff,%eax
ffffffff81b9b651: f7 d0                not    %eax
ffffffff81b9b653: c1 e8 10              shr    $0x10,%eax
ffffffff81b9b656: 48 83 c4 04          add    $0x4,%rsp
ffffffff81b9b65a: 5d                    pop    %rbp
ffffffff81b9b65b: c3                    ret
ffffffff81b9b65c: cc                    int3
ffffffff81b9b65d: 00 00                add    %al,(%rax)
ffffffff81b9b65f: cc                    int3

After patch:

00000000000000a0 <csum_ipv6_magic>:
  a0: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
a1: R_X86_64_NONE __fentry__-0x4
  a5: 55                    push   %rbp
  a6: 48 89 e5              mov    %rsp,%rbp
  a9: 0f ca                bswap  %edx
  ab: 89 c9                mov    %ecx,%ecx
  ad: 48 c1 e1 08          shl    $0x8,%rcx
  b1: 44 89 c0              mov    %r8d,%eax
  b4: 48 01 d0              add    %rdx,%rax
  b7: 48 01 c8              add    %rcx,%rax
  ba: 48 03 07              add    (%rdi),%rax
  bd: 48 13 47 08          adc    0x8(%rdi),%rax
  c1: 48 13 06              adc    (%rsi),%rax
  c4: 48 13 46 08          adc    0x8(%rsi),%rax
  c8: 48 83 d0 00          adc    $0x0,%rax
  cc: 48 89 c1              mov    %rax,%rcx
  cf: 48 c1 e9 20          shr    $0x20,%rcx
  d3: 01 c8                add    %ecx,%eax   // just better !
  d5: 83 d0 00              adc    $0x0,%eax
  d8: 89 c1                mov    %eax,%ecx
  da: c1 e1 10              shl    $0x10,%ecx
  dd: 25 00 00 ff ff        and    $0xffff0000,%eax
  e2: 01 c8                add    %ecx,%eax
  e4: 15 ff ff 00 00        adc    $0xffff,%eax
  e9: f7 d0                not    %eax
  eb: c1 e8 10              shr    $0x10,%eax
  ee: 5d                    pop    %rbp
  ef: c3                    ret
  f0: cc                    int3

  reply	other threads:[~2024-01-06 10:26 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230628020657.957880-1-goldstein.w.n@gmail.com>
2023-06-28  9:12 ` x86/csum: Remove unnecessary odd handling Borislav Petkov
2023-06-28 15:32   ` Noah Goldstein
2023-06-28 17:44     ` Linus Torvalds
2023-06-28 18:34       ` Noah Goldstein
2023-06-28 20:02         ` Linus Torvalds
2023-06-29 14:04   ` David Laight
2023-06-29 14:27   ` David Laight
2023-09-01 22:21 ` Noah Goldstein
2023-09-06 13:49   ` David Laight
2023-09-06 14:38   ` David Laight
2023-09-20 19:20     ` Noah Goldstein
2023-09-20 19:23 ` Noah Goldstein
2023-09-23  3:24   ` kernel test robot
2023-09-23 14:05     ` Noah Goldstein
2023-09-23 21:13       ` David Laight
2023-09-24 14:35         ` Noah Goldstein
2023-12-23 22:18           ` Noah Goldstein
2024-01-04 23:28             ` Noah Goldstein
2024-01-04 23:34               ` Dave Hansen
2024-01-04 23:36               ` Linus Torvalds
2024-01-05  0:33                 ` Linus Torvalds
2024-01-05 10:41                   ` David Laight
2024-01-05 16:12                     ` David Laight
2024-01-05 18:05                     ` Linus Torvalds
2024-01-05 23:52                       ` David Laight
2024-01-06  0:18                         ` Linus Torvalds
2024-01-06 10:26                           ` Eric Dumazet [this message]
2024-01-06 19:32                             ` Linus Torvalds
2024-01-07 12:11                             ` David Laight
2024-01-06 22:08                       ` David Laight
2024-01-07  1:09                         ` H. Peter Anvin
2024-01-07 11:44                           ` David Laight
2023-09-24 14:35 ` Noah Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iKjUZjw-9ACNWrEd_H+o79Uwkw9NVuujQ3w=c2pGRFotg@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=David.Laight@aculab.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=goldstein.w.n@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=mingo@redhat.com \
    --cc=oe-kbuild-all@lists.linux.dev \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.