All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	nathanl@linux.ibm.com, arnd@arndb.de, tglx@linutronix.de,
	vincenzo.frascino@arm.com, luto@kernel.org, x86@kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org
Subject: Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
Date: Mon, 20 Jan 2020 18:08:23 +0100	[thread overview]
Message-ID: <4b0e5941-c37e-3c85-3809-45f33ce35657@c-s.fr> (raw)
In-Reply-To: <20200120151936.GB3191@gate.crashing.org>



Le 20/01/2020 à 16:19, Segher Boessenkool a écrit :
> On Mon, Jan 20, 2020 at 02:56:00PM +0000, Christophe Leroy wrote:
>>> Nice!  Much better.
>>>
>>> It should be tested on more representative hardware, too, but this looks
>>> promising alright :-)
>>
>> mpc832x (e300c2 core) at 333 MHz:
>>
>> Before:
>>
>> gettimeofday:    vdso: 235 nsec/call
>> clock-gettime-realtime:    vdso: 244 nsec/call
>>
>> With the series:
>>
>> gettimeofday:    vdso: 271 nsec/call
>> clock-gettime-realtime:    vdso: 281 nsec/call
> 
> Those are important, and degrade ~15%.  That is acceptable IMO, but do
> you see a way to optimise this (later)?

Not easy I think.

First we have the unavoidable ASM entry function that can't be dropped 
because of the CR[SO] bit the set on error or clear on no error and that 
can't be done in C.

In our ASM VDSO, fixed shifts are used, while in generic C VDSO, shifts 
are generic and read from the VDSO data.

And there is still some funny code generated by GCC (8.1), like:

  620:	7d 29 3c 30 	srw     r9,r9,r7
  624:	21 87 00 20 	subfic  r12,r7,32
  628:	7d 07 3c 31 	srw.    r7,r8,r7
  62c:	7d 08 60 30 	slw     r8,r8,r12
  630:	7d 0b 4b 78 	or      r11,r8,r9
  634:	39 40 00 00 	li      r10,0
  638:	40 82 00 84 	bne     6bc <__c_kernel_clock_gettime+0x114>
  63c:	81 23 00 24 	lwz     r9,36(r3)
  640:	81 05 00 00 	lwz     r8,0(r5)
...
  6bc:	7d 69 5b 78 	mr      r9,r11
  6c0:	7c ea 3b 78 	mr      r10,r7
  6c4:	7d 2b 4b 78 	mr      r11,r9
  6c8:	4b ff ff 74 	b       63c <__c_kernel_clock_gettime+0x94>

This branch to 6bc is totally useless:
- copying r11 into r9 is pointless as r9 is overwritten in 63c
- copying back r9 into r11 is pointless as r11 has not been modified 
inbetween.
- loading r10 with 0 then overwritting r10 with r7 when r7 is not 0 is 
pointless as well, could have directly put the result of srw. in r10.

Christophe

WARNING: multiple messages have this Message-ID (diff)
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: nathanl@linux.ibm.com, arnd@arndb.de, x86@kernel.org,
	linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org,
	Paul Mackerras <paulus@samba.org>,
	luto@kernel.org, tglx@linutronix.de, vincenzo.frascino@arm.com,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
Date: Mon, 20 Jan 2020 18:08:23 +0100	[thread overview]
Message-ID: <4b0e5941-c37e-3c85-3809-45f33ce35657@c-s.fr> (raw)
In-Reply-To: <20200120151936.GB3191@gate.crashing.org>



Le 20/01/2020 à 16:19, Segher Boessenkool a écrit :
> On Mon, Jan 20, 2020 at 02:56:00PM +0000, Christophe Leroy wrote:
>>> Nice!  Much better.
>>>
>>> It should be tested on more representative hardware, too, but this looks
>>> promising alright :-)
>>
>> mpc832x (e300c2 core) at 333 MHz:
>>
>> Before:
>>
>> gettimeofday:    vdso: 235 nsec/call
>> clock-gettime-realtime:    vdso: 244 nsec/call
>>
>> With the series:
>>
>> gettimeofday:    vdso: 271 nsec/call
>> clock-gettime-realtime:    vdso: 281 nsec/call
> 
> Those are important, and degrade ~15%.  That is acceptable IMO, but do
> you see a way to optimise this (later)?

Not easy I think.

First we have the unavoidable ASM entry function that can't be dropped 
because of the CR[SO] bit the set on error or clear on no error and that 
can't be done in C.

In our ASM VDSO, fixed shifts are used, while in generic C VDSO, shifts 
are generic and read from the VDSO data.

And there is still some funny code generated by GCC (8.1), like:

  620:	7d 29 3c 30 	srw     r9,r9,r7
  624:	21 87 00 20 	subfic  r12,r7,32
  628:	7d 07 3c 31 	srw.    r7,r8,r7
  62c:	7d 08 60 30 	slw     r8,r8,r12
  630:	7d 0b 4b 78 	or      r11,r8,r9
  634:	39 40 00 00 	li      r10,0
  638:	40 82 00 84 	bne     6bc <__c_kernel_clock_gettime+0x114>
  63c:	81 23 00 24 	lwz     r9,36(r3)
  640:	81 05 00 00 	lwz     r8,0(r5)
...
  6bc:	7d 69 5b 78 	mr      r9,r11
  6c0:	7c ea 3b 78 	mr      r10,r7
  6c4:	7d 2b 4b 78 	mr      r11,r9
  6c8:	4b ff ff 74 	b       63c <__c_kernel_clock_gettime+0x94>

This branch to 6bc is totally useless:
- copying r11 into r9 is pointless as r9 is overwritten in 63c
- copying back r9 into r11 is pointless as r11 has not been modified 
inbetween.
- loading r10 with 0 then overwritting r10 with r7 when r7 is not 0 is 
pointless as well, could have directly put the result of srw. in r10.

Christophe

WARNING: multiple messages have this Message-ID (diff)
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Segher Boessenkool <segher@kernel.crashing.org>
Cc: nathanl@linux.ibm.com, arnd@arndb.de,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	linux-mips@vger.kernel.org, Paul Mackerras <paulus@samba.org>,
	luto@kernel.org, Michael Ellerman <mpe@ellerman.id.au>,
	tglx@linutronix.de, vincenzo.frascino@arm.com,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation.
Date: Mon, 20 Jan 2020 18:08:23 +0100	[thread overview]
Message-ID: <4b0e5941-c37e-3c85-3809-45f33ce35657@c-s.fr> (raw)
In-Reply-To: <20200120151936.GB3191@gate.crashing.org>



Le 20/01/2020 à 16:19, Segher Boessenkool a écrit :
> On Mon, Jan 20, 2020 at 02:56:00PM +0000, Christophe Leroy wrote:
>>> Nice!  Much better.
>>>
>>> It should be tested on more representative hardware, too, but this looks
>>> promising alright :-)
>>
>> mpc832x (e300c2 core) at 333 MHz:
>>
>> Before:
>>
>> gettimeofday:    vdso: 235 nsec/call
>> clock-gettime-realtime:    vdso: 244 nsec/call
>>
>> With the series:
>>
>> gettimeofday:    vdso: 271 nsec/call
>> clock-gettime-realtime:    vdso: 281 nsec/call
> 
> Those are important, and degrade ~15%.  That is acceptable IMO, but do
> you see a way to optimise this (later)?

Not easy I think.

First we have the unavoidable ASM entry function that can't be dropped 
because of the CR[SO] bit the set on error or clear on no error and that 
can't be done in C.

In our ASM VDSO, fixed shifts are used, while in generic C VDSO, shifts 
are generic and read from the VDSO data.

And there is still some funny code generated by GCC (8.1), like:

  620:	7d 29 3c 30 	srw     r9,r9,r7
  624:	21 87 00 20 	subfic  r12,r7,32
  628:	7d 07 3c 31 	srw.    r7,r8,r7
  62c:	7d 08 60 30 	slw     r8,r8,r12
  630:	7d 0b 4b 78 	or      r11,r8,r9
  634:	39 40 00 00 	li      r10,0
  638:	40 82 00 84 	bne     6bc <__c_kernel_clock_gettime+0x114>
  63c:	81 23 00 24 	lwz     r9,36(r3)
  640:	81 05 00 00 	lwz     r8,0(r5)
...
  6bc:	7d 69 5b 78 	mr      r9,r11
  6c0:	7c ea 3b 78 	mr      r10,r7
  6c4:	7d 2b 4b 78 	mr      r11,r9
  6c8:	4b ff ff 74 	b       63c <__c_kernel_clock_gettime+0x94>

This branch to 6bc is totally useless:
- copying r11 into r9 is pointless as r9 is overwritten in 63c
- copying back r9 into r11 is pointless as r11 has not been modified 
inbetween.
- loading r10 with 0 then overwritting r10 with r7 when r7 is not 0 is 
pointless as well, could have directly put the result of srw. in r10.

Christophe

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-01-20 17:08 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-16 17:58 [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation Christophe Leroy
2020-01-16 17:58 ` Christophe Leroy
2020-01-16 17:58 ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 01/11] powerpc/64: Don't provide time functions in compat VDSO32 Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 02/11] powerpc/vdso: Switch VDSO to generic C implementation Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 03/11] lib: vdso: only read hrtimer_res when needed in __cvdso_clock_getres() Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 04/11] powerpc/vdso: simplify __get_datapage() Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 05/11] lib: vdso: allow arches to provide vdso data pointer Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 06/11] powerpc/vdso: provide inline alternative to __get_datapage() Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 07/11] powerpc/vdso: provide vdso data pointer from the ASM caller Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 08/11] lib: vdso: allow fixed clock mode Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 20:13   ` Thomas Gleixner
2020-01-16 20:13     ` Thomas Gleixner
2020-01-16 20:13     ` Thomas Gleixner
2020-01-16 20:19     ` Andy Lutomirski
2020-01-16 20:19       ` Andy Lutomirski
2020-01-16 20:19       ` Andy Lutomirski
2020-01-16 21:07       ` Thomas Gleixner
2020-01-16 21:07         ` Thomas Gleixner
2020-01-16 21:07         ` Thomas Gleixner
2020-01-16 17:58 ` [RFC PATCH v4 09/11] powerpc/vdso: override __arch_vdso_capable() Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 10/11] lib: vdso: Allow arches to override the ns shift operation Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 19:47   ` Andy Lutomirski
2020-01-16 19:47     ` Andy Lutomirski
2020-01-16 19:47     ` Andy Lutomirski
2020-01-16 19:57     ` Thomas Gleixner
2020-01-16 19:57       ` Thomas Gleixner
2020-01-16 19:57       ` Thomas Gleixner
2020-01-16 20:20       ` Andy Lutomirski
2020-01-16 20:20         ` Andy Lutomirski
2020-01-16 20:20         ` Andy Lutomirski
2020-01-29  7:14         ` Thomas Gleixner
2020-01-29  7:14           ` Thomas Gleixner
2020-01-29  7:14           ` Thomas Gleixner
2020-01-29  7:26           ` Christophe Leroy
2020-01-29  7:26             ` Christophe Leroy
2020-01-29  7:26             ` Christophe Leroy
2020-01-16 17:58 ` [RFC PATCH v4 11/11] powerpc/32: provide vdso_shift_ns() Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-16 17:58   ` Christophe Leroy
2020-01-17  8:58 ` [RFC PATCH v4 00/11] powerpc: switch VDSO to C implementation Segher Boessenkool
2020-01-17  8:58   ` Segher Boessenkool
2020-01-17  8:58   ` Segher Boessenkool
2020-01-17  9:26   ` Christophe Leroy
2020-01-17  9:26     ` Christophe Leroy
2020-01-17  9:26     ` Christophe Leroy
2020-01-20 14:56   ` Christophe Leroy
2020-01-20 14:56     ` Christophe Leroy
2020-01-20 14:56     ` Christophe Leroy
2020-01-20 15:19     ` Segher Boessenkool
2020-01-20 15:19       ` Segher Boessenkool
2020-01-20 15:19       ` Segher Boessenkool
2020-01-20 17:08       ` Christophe Leroy [this message]
2020-01-20 17:08         ` Christophe Leroy
2020-01-20 17:08         ` Christophe Leroy
2020-01-20 17:27         ` Segher Boessenkool
2020-01-20 17:27           ` Segher Boessenkool
2020-01-20 17:27           ` Segher Boessenkool

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4b0e5941-c37e-3c85-3809-45f33ce35657@c-s.fr \
    --to=christophe.leroy@c-s.fr \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=paulus@samba.org \
    --cc=segher@kernel.crashing.org \
    --cc=tglx@linutronix.de \
    --cc=vincenzo.frascino@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.