All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Segher Boessenkool <segher@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>,
	luto@kernel.org, vincenzo.frascino@arm.com,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation.
Date: Sun, 27 Oct 2019 10:21:25 +0100	[thread overview]
Message-ID: <8e4d0b82-a7a1-b7f1-308e-df871b32d317@c-s.fr> (raw)
In-Reply-To: <20191026230609.GY28442@gate.crashing.org>



Le 27/10/2019 à 01:06, Segher Boessenkool a écrit :
> On Sat, Oct 26, 2019 at 08:48:27PM +0200, Thomas Gleixner wrote:
>> On Sat, 26 Oct 2019, Christophe Leroy wrote:
>> Let's look at the code:
>>
>> __cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
>> {
>>          const struct vdso_data *vd = __arch_get_vdso_data();
>>
>>          if (likely(tv != NULL)) {
>> 		struct __kernel_timespec ts;
>>
>>                  if (do_hres(&vd[CS_HRES_COARSE], CLOCK_REALTIME, &ts))
>>                          return gettimeofday_fallback(tv, tz);
>>
>>                  tv->tv_sec = ts.tv_sec;
>>                  tv->tv_usec = (u32)ts.tv_nsec / NSEC_PER_USEC;
>>
>> IIRC PPC did some magic math tricks to avoid that. Could you just for the
>> fun of it replace this division with
>>
>>         (u32)ts.tv_nsec >> 10;
> 
> On this particular CPU (the 885, right?) a division by 1000 is just 9
> cycles.  On other CPUs it can be more, say 19 cycles like on the 750; not
> cheap at all, but not hugely expensive either, comparatively.
> 
> (A 64/32->32 division is expensive on all 32-bit PowerPC: there is no
> hardware help for it at all, so it's all done in software.)
> 
> Of course the compiler won't do a division by a constant with a division
> instruction at all, so it's somewhat cheaper even, 5 or 6 cycles or so.
> 
>> One thing which might be worth to try as well is to mark all functions in
>> that file as inline. The speedup by the do_hres() inlining was impressive
>> on PPC.
> 
> The hand-optimised asm code will pretty likely win handsomely, whatever
> you do.  Especially on cores like the 885 (no branch prediction, single
> issue, small caches, etc.: every instruction counts).
> 
> Is there any reason to replace this hand-optimised code?  It was written
> for exacty this reason?  These functions are critical and should be as
> fast as possible.

Well, all this started with COARSE clocks not being supported by PPC32 
VDSO. I first submitted a series with a set of optimisations including 
the implementation of COARSE clocks 
(https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=126779)

Then after a comment received on patch 4 of the series from Santosh 
Sivaraj asking for a common implementation of it for PPC32 and PPC64, I 
started looking into making the whole VDSO source code common to PPC32 
and PPC64. Most functions are similar. Time functions are also rather 
similar but unfortunately don't use the same registers. They also don't 
cover all possible clocks. And getres() is also buggy, see series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110321

So instead of reworking the existing time functions, I started 
investigating whether we could plug powerpc to the generic 
implementation. One drawback of PPC is that we need to setup an ASM 
trampoline to handle the SO bit as it can't be handled from C directly, 
can it ?

How critical are these functions ? Although we have a slight degration 
with the C implementation, they are still way faster than the 
corresponding syscall.

Another thing I was wondering, is it worth using the 64 bit timebase on 
PPC32 ? As far as I understand, the timebase is there to calculate a 
linear date update since last VDSO datapage update. How often is the 
VDSO datapage updated ? On the 885 clocked at 132Mhz, the timebase is at 
8.25 Mhz, which means it needs more than 8 minutes to loop over 32 bits.

Christophe

WARNING: multiple messages have this Message-ID (diff)
From: Christophe Leroy <christophe.leroy@c-s.fr>
To: Segher Boessenkool <segher@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: linuxppc-dev@lists.ozlabs.org, vincenzo.frascino@arm.com,
	Paul Mackerras <paulus@samba.org>,
	linux-kernel@vger.kernel.org, luto@kernel.org
Subject: Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation.
Date: Sun, 27 Oct 2019 10:21:25 +0100	[thread overview]
Message-ID: <8e4d0b82-a7a1-b7f1-308e-df871b32d317@c-s.fr> (raw)
In-Reply-To: <20191026230609.GY28442@gate.crashing.org>



Le 27/10/2019 à 01:06, Segher Boessenkool a écrit :
> On Sat, Oct 26, 2019 at 08:48:27PM +0200, Thomas Gleixner wrote:
>> On Sat, 26 Oct 2019, Christophe Leroy wrote:
>> Let's look at the code:
>>
>> __cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz)
>> {
>>          const struct vdso_data *vd = __arch_get_vdso_data();
>>
>>          if (likely(tv != NULL)) {
>> 		struct __kernel_timespec ts;
>>
>>                  if (do_hres(&vd[CS_HRES_COARSE], CLOCK_REALTIME, &ts))
>>                          return gettimeofday_fallback(tv, tz);
>>
>>                  tv->tv_sec = ts.tv_sec;
>>                  tv->tv_usec = (u32)ts.tv_nsec / NSEC_PER_USEC;
>>
>> IIRC PPC did some magic math tricks to avoid that. Could you just for the
>> fun of it replace this division with
>>
>>         (u32)ts.tv_nsec >> 10;
> 
> On this particular CPU (the 885, right?) a division by 1000 is just 9
> cycles.  On other CPUs it can be more, say 19 cycles like on the 750; not
> cheap at all, but not hugely expensive either, comparatively.
> 
> (A 64/32->32 division is expensive on all 32-bit PowerPC: there is no
> hardware help for it at all, so it's all done in software.)
> 
> Of course the compiler won't do a division by a constant with a division
> instruction at all, so it's somewhat cheaper even, 5 or 6 cycles or so.
> 
>> One thing which might be worth to try as well is to mark all functions in
>> that file as inline. The speedup by the do_hres() inlining was impressive
>> on PPC.
> 
> The hand-optimised asm code will pretty likely win handsomely, whatever
> you do.  Especially on cores like the 885 (no branch prediction, single
> issue, small caches, etc.: every instruction counts).
> 
> Is there any reason to replace this hand-optimised code?  It was written
> for exacty this reason?  These functions are critical and should be as
> fast as possible.

Well, all this started with COARSE clocks not being supported by PPC32 
VDSO. I first submitted a series with a set of optimisations including 
the implementation of COARSE clocks 
(https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=126779)

Then after a comment received on patch 4 of the series from Santosh 
Sivaraj asking for a common implementation of it for PPC32 and PPC64, I 
started looking into making the whole VDSO source code common to PPC32 
and PPC64. Most functions are similar. Time functions are also rather 
similar but unfortunately don't use the same registers. They also don't 
cover all possible clocks. And getres() is also buggy, see series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110321

So instead of reworking the existing time functions, I started 
investigating whether we could plug powerpc to the generic 
implementation. One drawback of PPC is that we need to setup an ASM 
trampoline to handle the SO bit as it can't be handled from C directly, 
can it ?

How critical are these functions ? Although we have a slight degration 
with the C implementation, they are still way faster than the 
corresponding syscall.

Another thing I was wondering, is it worth using the 64 bit timebase on 
PPC32 ? As far as I understand, the timebase is there to calculate a 
linear date update since last VDSO datapage update. How often is the 
VDSO datapage updated ? On the 885 clocked at 132Mhz, the timebase is at 
8.25 Mhz, which means it needs more than 8 minutes to loop over 32 bits.

Christophe

  reply	other threads:[~2019-10-27  9:21 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-21 12:53 [RFC PATCH] powerpc/32: Switch VDSO to C implementation Christophe Leroy
2019-10-21 12:53 ` Christophe Leroy
2019-10-21 21:29 ` Thomas Gleixner
2019-10-21 21:29   ` Thomas Gleixner
2019-10-22  9:01   ` Christophe Leroy
2019-10-22  9:01     ` Christophe Leroy
2019-10-22 13:56     ` Christophe Leroy
2019-10-22 13:56       ` Christophe Leroy
2019-10-26 13:55       ` Andy Lutomirski
2019-10-26 13:55         ` Andy Lutomirski
2019-10-26 15:54         ` Christophe Leroy
2019-10-26 15:54           ` Christophe Leroy
2019-10-26 15:53       ` Thomas Gleixner
2019-10-26 15:53         ` Thomas Gleixner
2019-10-26 16:06         ` Christophe Leroy
2019-10-26 16:06           ` Christophe Leroy
2019-10-26 18:48           ` Thomas Gleixner
2019-10-26 18:48             ` Thomas Gleixner
2019-10-26 23:06             ` Segher Boessenkool
2019-10-26 23:06               ` Segher Boessenkool
2019-10-27  9:21               ` Christophe Leroy [this message]
2019-10-27  9:21                 ` Christophe Leroy
2019-10-27 19:07                 ` Segher Boessenkool
2019-10-27 19:07                   ` Segher Boessenkool
2019-12-20 18:24             ` Christophe Leroy
2019-12-20 18:24               ` Christophe Leroy
2020-01-09 14:05               ` Thomas Gleixner
2020-01-09 14:05                 ` Thomas Gleixner
2020-01-09 15:21                 ` Christophe Leroy
2020-01-09 15:21                   ` Christophe Leroy
2020-01-10 22:42                   ` Thomas Gleixner
2020-01-10 22:42                     ` Thomas Gleixner
2019-10-24  7:45 ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e4d0b82-a7a1-b7f1-308e-df871b32d317@c-s.fr \
    --to=christophe.leroy@c-s.fr \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=paulus@samba.org \
    --cc=segher@kernel.crashing.org \
    --cc=tglx@linutronix.de \
    --cc=vincenzo.frascino@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.