From: Segher Boessenkool <segher@kernel.crashing.org> To: Thomas Gleixner <tglx@linutronix.de> Cc: Christophe Leroy <christophe.leroy@c-s.fr>, linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>, luto@kernel.org, vincenzo.frascino@arm.com, linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation. Date: Sat, 26 Oct 2019 18:06:09 -0500 [thread overview] Message-ID: <20191026230609.GY28442@gate.crashing.org> (raw) In-Reply-To: <alpine.DEB.2.21.1910262026340.10190@nanos.tec.linutronix.de> On Sat, Oct 26, 2019 at 08:48:27PM +0200, Thomas Gleixner wrote: > On Sat, 26 Oct 2019, Christophe Leroy wrote: > Let's look at the code: > > __cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz) > { > const struct vdso_data *vd = __arch_get_vdso_data(); > > if (likely(tv != NULL)) { > struct __kernel_timespec ts; > > if (do_hres(&vd[CS_HRES_COARSE], CLOCK_REALTIME, &ts)) > return gettimeofday_fallback(tv, tz); > > tv->tv_sec = ts.tv_sec; > tv->tv_usec = (u32)ts.tv_nsec / NSEC_PER_USEC; > > IIRC PPC did some magic math tricks to avoid that. Could you just for the > fun of it replace this division with > > (u32)ts.tv_nsec >> 10; On this particular CPU (the 885, right?) a division by 1000 is just 9 cycles. On other CPUs it can be more, say 19 cycles like on the 750; not cheap at all, but not hugely expensive either, comparatively. (A 64/32->32 division is expensive on all 32-bit PowerPC: there is no hardware help for it at all, so it's all done in software.) Of course the compiler won't do a division by a constant with a division instruction at all, so it's somewhat cheaper even, 5 or 6 cycles or so. > One thing which might be worth to try as well is to mark all functions in > that file as inline. The speedup by the do_hres() inlining was impressive > on PPC. The hand-optimised asm code will pretty likely win handsomely, whatever you do. Especially on cores like the 885 (no branch prediction, single issue, small caches, etc.: every instruction counts). Is there any reason to replace this hand-optimised code? It was written for exacty this reason? These functions are critical and should be as fast as possible. Segher
WARNING: multiple messages have this Message-ID (diff)
From: Segher Boessenkool <segher@kernel.crashing.org> To: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>, luto@kernel.org, vincenzo.frascino@arm.com, linuxppc-dev@lists.ozlabs.org Subject: Re: [RFC PATCH] powerpc/32: Switch VDSO to C implementation. Date: Sat, 26 Oct 2019 18:06:09 -0500 [thread overview] Message-ID: <20191026230609.GY28442@gate.crashing.org> (raw) In-Reply-To: <alpine.DEB.2.21.1910262026340.10190@nanos.tec.linutronix.de> On Sat, Oct 26, 2019 at 08:48:27PM +0200, Thomas Gleixner wrote: > On Sat, 26 Oct 2019, Christophe Leroy wrote: > Let's look at the code: > > __cvdso_gettimeofday(struct __kernel_old_timeval *tv, struct timezone *tz) > { > const struct vdso_data *vd = __arch_get_vdso_data(); > > if (likely(tv != NULL)) { > struct __kernel_timespec ts; > > if (do_hres(&vd[CS_HRES_COARSE], CLOCK_REALTIME, &ts)) > return gettimeofday_fallback(tv, tz); > > tv->tv_sec = ts.tv_sec; > tv->tv_usec = (u32)ts.tv_nsec / NSEC_PER_USEC; > > IIRC PPC did some magic math tricks to avoid that. Could you just for the > fun of it replace this division with > > (u32)ts.tv_nsec >> 10; On this particular CPU (the 885, right?) a division by 1000 is just 9 cycles. On other CPUs it can be more, say 19 cycles like on the 750; not cheap at all, but not hugely expensive either, comparatively. (A 64/32->32 division is expensive on all 32-bit PowerPC: there is no hardware help for it at all, so it's all done in software.) Of course the compiler won't do a division by a constant with a division instruction at all, so it's somewhat cheaper even, 5 or 6 cycles or so. > One thing which might be worth to try as well is to mark all functions in > that file as inline. The speedup by the do_hres() inlining was impressive > on PPC. The hand-optimised asm code will pretty likely win handsomely, whatever you do. Especially on cores like the 885 (no branch prediction, single issue, small caches, etc.: every instruction counts). Is there any reason to replace this hand-optimised code? It was written for exacty this reason? These functions are critical and should be as fast as possible. Segher
next prev parent reply other threads:[~2019-10-26 23:10 UTC|newest] Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-10-21 12:53 [RFC PATCH] powerpc/32: Switch VDSO to C implementation Christophe Leroy 2019-10-21 12:53 ` Christophe Leroy 2019-10-21 21:29 ` Thomas Gleixner 2019-10-21 21:29 ` Thomas Gleixner 2019-10-22 9:01 ` Christophe Leroy 2019-10-22 9:01 ` Christophe Leroy 2019-10-22 13:56 ` Christophe Leroy 2019-10-22 13:56 ` Christophe Leroy 2019-10-26 13:55 ` Andy Lutomirski 2019-10-26 13:55 ` Andy Lutomirski 2019-10-26 15:54 ` Christophe Leroy 2019-10-26 15:54 ` Christophe Leroy 2019-10-26 15:53 ` Thomas Gleixner 2019-10-26 15:53 ` Thomas Gleixner 2019-10-26 16:06 ` Christophe Leroy 2019-10-26 16:06 ` Christophe Leroy 2019-10-26 18:48 ` Thomas Gleixner 2019-10-26 18:48 ` Thomas Gleixner 2019-10-26 23:06 ` Segher Boessenkool [this message] 2019-10-26 23:06 ` Segher Boessenkool 2019-10-27 9:21 ` Christophe Leroy 2019-10-27 9:21 ` Christophe Leroy 2019-10-27 19:07 ` Segher Boessenkool 2019-10-27 19:07 ` Segher Boessenkool 2019-12-20 18:24 ` Christophe Leroy 2019-12-20 18:24 ` Christophe Leroy 2020-01-09 14:05 ` Thomas Gleixner 2020-01-09 14:05 ` Thomas Gleixner 2020-01-09 15:21 ` Christophe Leroy 2020-01-09 15:21 ` Christophe Leroy 2020-01-10 22:42 ` Thomas Gleixner 2020-01-10 22:42 ` Thomas Gleixner 2019-10-24 7:45 ` kbuild test robot
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20191026230609.GY28442@gate.crashing.org \ --to=segher@kernel.crashing.org \ --cc=christophe.leroy@c-s.fr \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=luto@kernel.org \ --cc=paulus@samba.org \ --cc=tglx@linutronix.de \ --cc=vincenzo.frascino@arm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.