All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86/uaccess: small optimization in unsafe_copy_to_user()
Date: Fri, 16 Apr 2021 22:57:00 +0200	[thread overview]
Message-ID: <CANn89iLDov_F+VWmnx8q=pnM7LGcwu_JfoQ4ftGYygLAno3taQ@mail.gmail.com> (raw)
In-Reply-To: <CANn89i+mWh3=36R8Y8Fra0wQY4p82EPDNgZ=O5P7+d8meGxsiA@mail.gmail.com>

On Fri, Apr 16, 2021 at 10:11 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Apr 16, 2021 at 9:44 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Fri, Apr 16, 2021 at 12:24:13PM -0700, Eric Dumazet wrote:
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > We have to loop only to copy u64 values.
> > > After this first loop, we copy at most one u32, one u16 and one byte.
> >
> > Does it actually yield a better code?
> >
>
> Yes, my patch gives a better code, on actual kernel use-case
>
> (net-next tree, look at put_cmsg())
>
> 5ca: 48 89 0f              mov    %rcx,(%rdi)
>  5cd: 89 77 08              mov    %esi,0x8(%rdi)
>  5d0: 89 57 0c              mov    %edx,0xc(%rdi)
>  5d3: 48 83 c7 10          add    $0x10,%rdi
>  5d7: 48 83 c1 f0          add    $0xfffffffffffffff0,%rcx
>  5db: 48 83 f9 07          cmp    $0x7,%rcx
>  5df: 76 40                jbe    621 <put_cmsg+0x111>
>  5e1: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 nopw
> %cs:0x0(%rax,%rax,1)
>  5e8: 0f 1f 84 00 00 00 00
>  5ef: 00
>  5f0: 49 8b 10              mov    (%r8),%rdx
>  5f3: 48 89 17              mov    %rdx,(%rdi)
>  5f6: 48 83 c7 08          add    $0x8,%rdi
>  5fa: 49 83 c0 08          add    $0x8,%r8
>  5fe: 48 83 c1 f8          add    $0xfffffffffffffff8,%rcx
>  602: 48 83 f9 07          cmp    $0x7,%rcx
>  606: 77 e8                ja     5f0 <put_cmsg+0xe0>
>  608: eb 17                jmp    621 <put_cmsg+0x111>
>  60a: 66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
>  610: 41 8b 10              mov    (%r8),%edx
>  613: 89 17                mov    %edx,(%rdi)
>  615: 48 83 c7 04          add    $0x4,%rdi
>  619: 49 83 c0 04          add    $0x4,%r8
>  61d: 48 83 c1 fc          add    $0xfffffffffffffffc,%rcx
>  621: 48 83 f9 03          cmp    $0x3,%rcx
>  625: 77 e9                ja     610 <put_cmsg+0x100>
>  627: eb 1a                jmp    643 <put_cmsg+0x133>
>  629: 0f 1f 80 00 00 00 00 nopl   0x0(%rax)
>  630: 41 0f b7 10          movzwl (%r8),%edx
>  634: 66 89 17              mov    %dx,(%rdi)
>  637: 48 83 c7 02          add    $0x2,%rdi
>  63b: 49 83 c0 02          add    $0x2,%r8
>  63f: 48 83 c1 fe          add    $0xfffffffffffffffe,%rcx
>  643: 48 83 f9 01          cmp    $0x1,%rcx
>  647: 77 e7                ja     630 <put_cmsg+0x120>
>  649: eb 15                jmp    660 <put_cmsg+0x150>
>  64b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
>  650: 41 0f b6 08          movzbl (%r8),%ecx
>  654: 88 0f                mov    %cl,(%rdi)
>  656: 48 83 c7 01          add    $0x1,%rdi
>  65a: 49 83 c0 01          add    $0x1,%r8
>  65e: 31 c9                xor    %ecx,%ecx
>  660: 48 85 c9              test   %rcx,%rcx
>  663: 75 eb                jne    650 <put_cmsg+0x140>

After the change code is now what we would expect (no jmp around)
 5db: 48 83 f9 08          cmp    $0x8,%rcx
 5df: 72 27                jb     608 <put_cmsg+0xf8>
 5e1: 66 66 66 66 66 66 2e data16 data16 data16 data16 data16 nopw
%cs:0x0(%rax,%rax,1)
 5e8: 0f 1f 84 00 00 00 00
 5ef: 00
 5f0: 49 8b 10              mov    (%r8),%rdx
 5f3: 48 89 17              mov    %rdx,(%rdi)
 5f6: 48 83 c7 08          add    $0x8,%rdi
 5fa: 49 83 c0 08          add    $0x8,%r8
 5fe: 48 83 c1 f8          add    $0xfffffffffffffff8,%rcx
 602: 48 83 f9 08          cmp    $0x8,%rcx
 606: 73 e8                jae    5f0 <put_cmsg+0xe0>
 608: 48 83 f9 04          cmp    $0x4,%rcx
 60c: 72 11                jb     61f <put_cmsg+0x10f>
 60e: 41 8b 10              mov    (%r8),%edx
 611: 89 17                mov    %edx,(%rdi)
 613: 48 83 c7 04          add    $0x4,%rdi
 617: 49 83 c0 04          add    $0x4,%r8
 61b: 48 83 c1 fc          add    $0xfffffffffffffffc,%rcx
 61f: 48 83 f9 02          cmp    $0x2,%rcx
 623: 72 13                jb     638 <put_cmsg+0x128>
 625: 41 0f b7 10          movzwl (%r8),%edx
 629: 66 89 17              mov    %dx,(%rdi)
 62c: 48 83 c7 02          add    $0x2,%rdi
 630: 49 83 c0 02          add    $0x2,%r8
 634: 48 83 c1 fe          add    $0xfffffffffffffffe,%rcx
 638: 48 85 c9              test   %rcx,%rcx
 63b: 74 05                je     642 <put_cmsg+0x132>
 63d: 41 8a 08              mov    (%r8),%cl
 640: 88 0f                mov    %cl,(%rdi)

As I said, its minor, I am sure you can come up to something much better !

Thanks !

>

>
> > FWIW, this
> > void bar(unsigned);
> > void foo(unsigned n)
> > {
> >         while (n >= 8) {
> >                 bar(n);
> >                 n -= 8;
> >         }
> >         while (n >= 4) {
> >                 bar(n);
> >                 n -= 4;
> >         }
> >         while (n >= 2) {
> >                 bar(n);
> >                 n -= 2;
> >         }
> >         while (n >= 1) {
> >                 bar(n);
> >                 n -= 1;
> >         }
> > }
> >
> > will compile (with -O2) to
> >         pushq   %rbp
> >         pushq   %rbx
> >         movl    %edi, %ebx
> >         subq    $8, %rsp
> >         cmpl    $7, %edi
> >         jbe     .L2
> >         movl    %edi, %ebp
> > .L3:
> >         movl    %ebp, %edi
> >         subl    $8, %ebp
> >         call    bar@PLT
> >         cmpl    $7, %ebp
> >         ja      .L3
> >         andl    $7, %ebx
> > .L2:
> >         cmpl    $3, %ebx
> >         jbe     .L4
> >         movl    %ebx, %edi
> >         andl    $3, %ebx
> >         call    bar@PLT
> > .L4:
> >         cmpl    $1, %ebx
> >         jbe     .L5
> >         movl    %ebx, %edi
> >         andl    $1, %ebx
> >         call    bar@PLT
> > .L5:
> >         testl   %ebx, %ebx
> >         je      .L1
> >         addq    $8, %rsp
> >         movl    $1, %edi
> >         popq    %rbx
> >         popq    %rbp
> >         jmp     bar@PLT
> > .L1:
> >         addq    $8, %rsp
> >         popq    %rbx
> >         popq    %rbp
> >         ret
> >
> > i.e. loop + if + if + if...

  reply	other threads:[~2021-04-16 20:57 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 19:24 [PATCH] x86/uaccess: small optimization in unsafe_copy_to_user() Eric Dumazet
2021-04-16 19:44 ` Al Viro
2021-04-16 20:11   ` Eric Dumazet
2021-04-16 20:57     ` Eric Dumazet [this message]
2021-04-17 13:59   ` David Laight
2021-04-17 16:03 ` Linus Torvalds
2021-04-17 16:08   ` Linus Torvalds
2021-04-17 16:27     ` Linus Torvalds
2021-04-17 18:09       ` Al Viro
2021-04-17 20:30         ` Al Viro
2021-04-17 20:35           ` Al Viro
2021-04-17 22:11             ` Linus Torvalds
2021-04-18  0:50               ` Al Viro
2021-04-17 19:44   ` Eric Dumazet
2021-04-17 19:51     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iLDov_F+VWmnx8q=pnM7LGcwu_JfoQ4ftGYygLAno3taQ@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.