linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] x86/asm changes for v4.18
@ 2018-06-04 12:21 Ingo Molnar
  2018-06-05  1:58 ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Ingo Molnar @ 2018-06-04 12:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest x86-asm-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-asm-for-linus

   # HEAD: 6469a0ee0a06b2ea1f5afbb1d5a3feed017d4c7a x86/io: Define readq()/writeq() to use 64-bit type

Two smaller changes:

 - better support (non-atomic) 64-bit readq()/writeq() variants (Andy Shevchenko)

 - __clear_user() micro-optimization (Alexey Dobriyan)

 Thanks,

	Ingo

------------------>
Alexey Dobriyan (1):
      x86/asm/64: Micro-optimize __clear_user() - Use immediate constants

Andy Shevchenko (1):
      x86/io: Define readq()/writeq() to use 64-bit type


 arch/x86/include/asm/io.h  | 8 ++++----
 arch/x86/lib/usercopy_64.c | 9 ++++-----
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
index f6e5b9375d8c..6de64840dd22 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -94,10 +94,10 @@ build_mmio_write(__writel, "l", unsigned int, "r", )
 
 #ifdef CONFIG_X86_64
 
-build_mmio_read(readq, "q", unsigned long, "=r", :"memory")
-build_mmio_read(__readq, "q", unsigned long, "=r", )
-build_mmio_write(writeq, "q", unsigned long, "r", :"memory")
-build_mmio_write(__writeq, "q", unsigned long, "r", )
+build_mmio_read(readq, "q", u64, "=r", :"memory")
+build_mmio_read(__readq, "q", u64, "=r", )
+build_mmio_write(writeq, "q", u64, "r", :"memory")
+build_mmio_write(__writeq, "q", u64, "r", )
 
 #define readq_relaxed(a)	__readq(a)
 #define writeq_relaxed(v, a)	__writeq(v, a)
diff --git a/arch/x86/lib/usercopy_64.c b/arch/x86/lib/usercopy_64.c
index 75d3776123cc..a624dcc4de10 100644
--- a/arch/x86/lib/usercopy_64.c
+++ b/arch/x86/lib/usercopy_64.c
@@ -23,13 +23,13 @@ unsigned long __clear_user(void __user *addr, unsigned long size)
 	asm volatile(
 		"	testq  %[size8],%[size8]\n"
 		"	jz     4f\n"
-		"0:	movq %[zero],(%[dst])\n"
-		"	addq   %[eight],%[dst]\n"
+		"0:	movq $0,(%[dst])\n"
+		"	addq   $8,%[dst]\n"
 		"	decl %%ecx ; jnz   0b\n"
 		"4:	movq  %[size1],%%rcx\n"
 		"	testl %%ecx,%%ecx\n"
 		"	jz     2f\n"
-		"1:	movb   %b[zero],(%[dst])\n"
+		"1:	movb   $0,(%[dst])\n"
 		"	incq   %[dst]\n"
 		"	decl %%ecx ; jnz  1b\n"
 		"2:\n"
@@ -40,8 +40,7 @@ unsigned long __clear_user(void __user *addr, unsigned long size)
 		_ASM_EXTABLE(0b,3b)
 		_ASM_EXTABLE(1b,2b)
 		: [size8] "=&c"(size), [dst] "=&D" (__d0)
-		: [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr),
-		  [zero] "r" (0UL), [eight] "r" (8UL));
+		: [size1] "r"(size & 7), "[size8]" (size / 8), "[dst]"(addr));
 	clac();
 	return size;
 }

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [GIT PULL] x86/asm changes for v4.18
  2018-06-04 12:21 [GIT PULL] x86/asm changes for v4.18 Ingo Molnar
@ 2018-06-05  1:58 ` Linus Torvalds
  2018-06-05 15:05   ` x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18") Ingo Molnar
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2018-06-05  1:58 UTC (permalink / raw)
  To: Ingo Molnar, Alexey Dobriyan
  Cc: Linux Kernel Mailing List, Thomas Gleixner, Peter Zijlstra,
	Andrew Morton

On Mon, Jun 4, 2018 at 5:21 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>  - __clear_user() micro-optimization (Alexey Dobriyan)

Was this actually tested?

I think one reason people avoided the constant was that on some
microarchitecture it ended up being a separate uop just for the
constant generation, because it wouldn't fit in a single uop.

I'm pretty sure that used to be the case for P4, for example.

Afaik there have also been issues with decoding instructions that have
both an immediate and a memory offset.

I suspect none of this is an issue on modern cores, but there really
at least historically were cases where

   mov %reg,mem

was better than

   mov $imm,mem

if %reg already had the right value, so it's not at all 100% obvious
that the micro-optimization really _optimizes_ anything.

Any time people do this, they should add numbers.

            Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05  1:58 ` Linus Torvalds
@ 2018-06-05 15:05   ` Ingo Molnar
  2018-06-05 15:47     ` Linus Torvalds
  2018-06-05 17:22     ` Alexey Dobriyan
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2018-06-05 15:05 UTC (permalink / raw)
  To: Linus Torvalds, Alexey Dobriyan
  Cc: Alexey Dobriyan, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andy Lutomirski, Borislav Petkov,
	Josh Poimboeuf, H. Peter Anvin, Denys Vlasenko


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Jun 4, 2018 at 5:21 AM Ingo Molnar <mingo@kernel.org> wrote:
> >
> >  - __clear_user() micro-optimization (Alexey Dobriyan)
> 
> Was this actually tested?

I'm not sure - Alexey?

> I think one reason people avoided the constant was that on some
> microarchitecture it ended up being a separate uop just for the
> constant generation, because it wouldn't fit in a single uop.
> 
> I'm pretty sure that used to be the case for P4, for example.
> 
> Afaik there have also been issues with decoding instructions that have
> both an immediate and a memory offset.
> 
> I suspect none of this is an issue on modern cores, but there really
> at least historically were cases where
> 
>    mov %reg,mem
> 
> was better than
> 
>    mov $imm,mem
> 
> if %reg already had the right value, so it's not at all 100% obvious
> that the micro-optimization really _optimizes_ anything.
> 
> Any time people do this, they should add numbers.

Ok, fair point and agreed - if Alexey sends some measurements to back the change 
I'll keep this, otherwise queue up a revert.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05 15:05   ` x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18") Ingo Molnar
@ 2018-06-05 15:47     ` Linus Torvalds
  2018-06-05 17:22     ` Alexey Dobriyan
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2018-06-05 15:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexey Dobriyan, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andrew Lutomirski,
	Borislav Petkov, Josh Poimboeuf, Peter Anvin, Denys Vlasenko

On Tue, Jun 5, 2018 at 8:05 AM Ingo Molnar <mingo@kernel.org> wrote:
>
> Ok, fair point and agreed - if Alexey sends some measurements to back the change
> I'll keep this, otherwise queue up a revert.

I don't think it needs to be reverted, it's not like it's likely to
hurt on any modern CPU's. The issues I talked about are fairly
historical - barely even 64-bit cpus - and I'm not sure an extra uop
to carry a constant around even matters in that code sequence.

It was more a generic issue - any micro-optimization should be based
on numbers (and there should be some numbers in the commit message),
not on "this should be faster". Because while intuitively immediates
_should_ be faster than registers, that's simply not always "obviously
true". It _may_ be true. But numbers talk.

                  Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05 15:05   ` x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18") Ingo Molnar
  2018-06-05 15:47     ` Linus Torvalds
@ 2018-06-05 17:22     ` Alexey Dobriyan
  2018-06-05 17:32       ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Alexey Dobriyan @ 2018-06-05 17:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andy Lutomirski, Borislav Petkov,
	Josh Poimboeuf, H. Peter Anvin, Denys Vlasenko

On Tue, Jun 05, 2018 at 05:05:14PM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> 
> > On Mon, Jun 4, 2018 at 5:21 AM Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >  - __clear_user() micro-optimization (Alexey Dobriyan)
> > 
> > Was this actually tested?
> 
> I'm not sure - Alexey?
> 
> > I think one reason people avoided the constant was that on some
> > microarchitecture it ended up being a separate uop just for the
> > constant generation, because it wouldn't fit in a single uop.

> Ok, fair point and agreed - if Alexey sends some measurements to back the change 
> I'll keep this, otherwise queue up a revert.

Tested? :^) I had P4 maybe ~15(?) years ago.

godbolt.org earliest compiler is 4.1.2 and it generates "movb [r32], imm8"
with "-m32 -O2 -march=pentium4" for simple memset-style loop
if it counts for something.

Actually I think __clear_user should be rewritten in C with assembly.
It's biggest user is probably ELF loader and those partial page .bss
clears should be noticeable.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05 17:22     ` Alexey Dobriyan
@ 2018-06-05 17:32       ` Linus Torvalds
  2018-06-05 22:41         ` Alexey Dobriyan
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2018-06-05 17:32 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andrew Lutomirski,
	Borislav Petkov, Josh Poimboeuf, Peter Anvin, Denys Vlasenko

On Tue, Jun 5, 2018 at 10:22 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> Tested? :^) I had P4 maybe ~15(?) years ago.

Did you EVEN test it on what you have today?

Do you have any numbers at all, in other words?

Micro-optimizations need numbers. Otherwise they aren't
micro-optimizations, they are just "change code randomly".

                    Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05 17:32       ` Linus Torvalds
@ 2018-06-05 22:41         ` Alexey Dobriyan
  2018-06-05 23:01           ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Alexey Dobriyan @ 2018-06-05 22:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andrew Lutomirski,
	Borislav Petkov, Josh Poimboeuf, Peter Anvin, Denys Vlasenko

On Tue, Jun 05, 2018 at 10:32:55AM -0700, Linus Torvalds wrote:
> On Tue, Jun 5, 2018 at 10:22 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > Tested? :^) I had P4 maybe ~15(?) years ago.
> 
> Did you EVEN test it on what you have today?
> 
> Do you have any numbers at all, in other words?
> 
> Micro-optimizations need numbers. Otherwise they aren't
> micro-optimizations, they are just "change code randomly".

On my potato performance increase is 33%, sheesh.
And CPU starts doing 3 instructions per cycle vs 2.

benchmark is "clear_user(p + 4096 - 4068, 4068)"
4068 comes from booting Debian 8 with printk.

f0(4068) (old clear_user)
--------
$ taskset -c 15 perf stat -r 16 ./a.out

 Performance counter stats for './a.out' (16 runs):

       2033.189084      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.41% )
                 2      context-switches          #    0.001 K/sec                    ( +- 11.11% )
                 0      cpu-migrations            #    0.000 K/sec
                46      page-faults               #    0.023 K/sec                    ( +-  0.91% )
     4,268,425,486      cycles                    #    2.099 GHz                      ( +-  0.41% )
     8,672,326,256      instructions              #    2.03  insn per cycle           ( +-  0.00% )
     2,169,900,710      branches                  # 1067.240 M/sec                    ( +-  0.00% )
         4,226,258      branch-misses             #    0.19% of all branches          ( +-  0.01% )

       2.033700109 seconds time elapsed                                          ( +-  0.41% )

f1(4068) (new clear_user)
$ taskset -c 15 perf stat -r 16 ./a.out

 Performance counter stats for './a.out' (16 runs):

       1345.149992      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.01% )
                 2      context-switches          #    0.002 K/sec                    ( +-  8.35% )
                 0      cpu-migrations            #    0.000 K/sec
                46      page-faults               #    0.034 K/sec                    ( +-  0.82% )
     2,823,965,728      cycles                    #    2.099 GHz                      ( +-  0.01% )
     8,661,733,733      instructions              #    3.07  insn per cycle           ( +-  0.00% )
     2,169,437,410      branches                  # 1612.785 M/sec                    ( +-  0.00% )
         4,216,469      branch-misses             #    0.19% of all branches          ( +-  0.01% )

       1.345375114 seconds time elapsed                                          ( +-  0.01% )

-------------------------------------
CFLAGS = -Wall -fno-strict-aliasing -fno-common -fshort-wchar -std=gnu89 -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1 -falign-loops=1 -mno-80387 -mno-fp-ret-in-387 -mpreferred-stack-boundary=3 -mskip-rax-setup -mtune=generic -mno-red-zone -funit-at-a-time -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0 -fno-stack-protector -fomit-frame-pointer -fno-var-tracking-assignments -g -femit-struct-debug-baseonly -fno-var-tracking -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack


0000000000000780 <f0>:
 780:	mov    rax,rsi
 783:	mov    rcx,rsi
 786:	xor    edx,edx
 788:	and    eax,0x7
 78b:	shr    rcx,0x3
 78f:	mov    esi,0x8
 794:	test   rcx,rcx
 797:	je     7a3 <f0+0x23>
 799:	mov    QWORD PTR [rdi],rdx
 79c:	add    rdi,rsi
 79f:	dec    ecx
 7a1:	jne    799 <f0+0x19>
 7a3:	mov    rcx,rax
 7a6:	test   ecx,ecx
 7a8:	je     7b3 <f0+0x33>
 7aa:	mov    BYTE PTR [rdi],dl
 7ac:	inc    rdi
 7af:	dec    ecx
 7b1:	jne    7aa <f0+0x2a>
 7b3:	mov    rax,rcx
 7b6:	ret    

00000000000007c0 <f1>:
 7c0:	mov    rax,rsi
 7c3:	shr    rsi,0x3
 7c7:	and    eax,0x7
 7ca:	mov    rcx,rsi
 7cd:	test   rcx,rcx
 7d0:	je     7e1 <f1+0x21>
 7d2:	mov    QWORD PTR [rdi],0x0
 7d9:	add    rdi,0x8
 7dd:	dec    ecx
 7df:	jne    7d2 <f1+0x12>
 7e1:	mov    rcx,rax
 7e4:	test   ecx,ecx
 7e6:	je     7f2 <f1+0x32>
 7e8:	mov    BYTE PTR [rdi],0x0
 7eb:	inc    rdi
 7ee:	dec    ecx
 7f0:	jne    7e8 <f1+0x28>
 7f2:	mov    rax,rcx
 7f5:	ret    

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05 22:41         ` Alexey Dobriyan
@ 2018-06-05 23:01           ` Linus Torvalds
  2018-06-05 23:04             ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2018-06-05 23:01 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andrew Lutomirski,
	Borislav Petkov, Josh Poimboeuf, Peter Anvin, Denys Vlasenko

On Tue, Jun 5, 2018 at 3:41 PM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> On my potato performance increase is 33%, sheesh.
> And CPU starts doing 3 instructions per cycle vs 2.

Whee. That's a shockingly big difference. On my CPU (i7-6700K) it
makes absolutely no difference whether the values are integers or in
registers.

               Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05 23:01           ` Linus Torvalds
@ 2018-06-05 23:04             ` Linus Torvalds
  2018-06-05 23:20               ` Alexey Dobriyan
  0 siblings, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2018-06-05 23:04 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andrew Lutomirski,
	Borislav Petkov, Josh Poimboeuf, Peter Anvin, Denys Vlasenko

On Tue, Jun 5, 2018 at 4:01 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, Jun 5, 2018 at 3:41 PM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> >
> > On my potato performance increase is 33%, sheesh.
> > And CPU starts doing 3 instructions per cycle vs 2.
>
> Whee. That's a shockingly big difference. On my CPU (i7-6700K) it
> makes absolutely no difference whether the values are integers or in
> registers.

In fact, looking at Agner Fog's instruction lists, I don't see any CPU
where it would make a difference, except for the P4 (where the
immediate looks like it's a bad idea because it's an extra uop, but it
might pack fine and not be noticeable).

But maybe I'm missing something subtle. What CPU, out of morbid interest?

               Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05 23:04             ` Linus Torvalds
@ 2018-06-05 23:20               ` Alexey Dobriyan
  2018-06-05 23:27                 ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Alexey Dobriyan @ 2018-06-05 23:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andrew Lutomirski,
	Borislav Petkov, Josh Poimboeuf, Peter Anvin, Denys Vlasenko

On Tue, Jun 05, 2018 at 04:04:37PM -0700, Linus Torvalds wrote:
> On Tue, Jun 5, 2018 at 4:01 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > On Tue, Jun 5, 2018 at 3:41 PM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > >
> > > On my potato performance increase is 33%, sheesh.
> > > And CPU starts doing 3 instructions per cycle vs 2.
> >
> > Whee. That's a shockingly big difference. On my CPU (i7-6700K) it
> > makes absolutely no difference whether the values are integers or in
> > registers.
> 
> In fact, looking at Agner Fog's instruction lists, I don't see any CPU
> where it would make a difference, except for the P4 (where the
> immediate looks like it's a bad idea because it's an extra uop, but it
> might pack fine and not be noticeable).
> 
> But maybe I'm missing something subtle. What CPU, out of morbid interest?

This is Broadwell Xeon E5-2620 v4.
Which is somewhat strange indeed because it should be modern enough.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")
  2018-06-05 23:20               ` Alexey Dobriyan
@ 2018-06-05 23:27                 ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2018-06-05 23:27 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton, Andrew Lutomirski,
	Borislav Petkov, Josh Poimboeuf, Peter Anvin, Denys Vlasenko

On Tue, Jun 5, 2018 at 4:20 PM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> This is Broadwell Xeon E5-2620 v4.
> Which is somewhat strange indeed because it should be modern enough.

Yeah, odd.

Here's the benchmark I used:

  #define SIZE 4068

  int main(int argc, char **argv)
  {
    int i;
    unsigned char buffer[SIZE], *p;

    for (i = 0; i < 1000000; i++)
        asm volatile(
            "1: movq %[zero],(%[mem]); addq %[eight],%[mem]; decl
%[count]; jne 1b"
            : [mem] "=r" (p)
            : [zero] "i" (0l), [eight] "i" (8l),
             "0" (buffer), [count] "r" (SIZE/8));
  }

where you can change that "i" for [zero] and [eight] to be "r" to get
the register version.

I just timed it, because I'm lazy and perf seemed to be overkill.

It might be some very specific loop buffer issue or something.

Or maybe my benchmark above is broken, I didn't really verify that the
end result was any good (I just did an objdump to verify the asm code
superficially).

                 Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-06-05 23:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-04 12:21 [GIT PULL] x86/asm changes for v4.18 Ingo Molnar
2018-06-05  1:58 ` Linus Torvalds
2018-06-05 15:05   ` x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18") Ingo Molnar
2018-06-05 15:47     ` Linus Torvalds
2018-06-05 17:22     ` Alexey Dobriyan
2018-06-05 17:32       ` Linus Torvalds
2018-06-05 22:41         ` Alexey Dobriyan
2018-06-05 23:01           ` Linus Torvalds
2018-06-05 23:04             ` Linus Torvalds
2018-06-05 23:20               ` Alexey Dobriyan
2018-06-05 23:27                 ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).