From: Segher Boessenkool <segher@kernel.crashing.org> To: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: nathanl@linux.ibm.com, linux-arch@vger.kernel.org, vincenzo.frascino@arm.com, arnd@arndb.de, linux-kernel@vger.kernel.org, Paul Mackerras <paulus@samba.org>, luto@kernel.org, tglx@linutronix.de, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation. Date: Wed, 5 Aug 2020 09:03:07 -0500 [thread overview] Message-ID: <20200805140307.GO6753@gate.crashing.org> (raw) In-Reply-To: <348528c33cd4007f3fee7fe643ef160843d09a6c.1596611196.git.christophe.leroy@csgroup.eu> Hi! On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote: > Provide vdso_shift_ns(), as the generic x >> s gives the following > bad result: > > 18: 35 25 ff e0 addic. r9,r5,-32 > 1c: 41 80 00 10 blt 2c <shift+0x14> > 20: 7c 64 4c 30 srw r4,r3,r9 > 24: 38 60 00 00 li r3,0 > ... > 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 > 30: 21 45 00 1f subfic r10,r5,31 > 34: 7c 84 2c 30 srw r4,r4,r5 > 38: 7d 29 50 30 slw r9,r9,r10 > 3c: 7c 63 2c 30 srw r3,r3,r5 > 40: 7d 24 23 78 or r4,r9,r4 > > In our case the shift is always <= 32. In addition, the upper 32 bits > of the result are likely nul. Lets GCC know it, it also optimises the > following calculations. > > With the patch, we get: > 0: 21 25 00 20 subfic r9,r5,32 > 4: 7c 69 48 30 slw r9,r3,r9 > 8: 7c 84 2c 30 srw r4,r4,r5 > c: 7d 24 23 78 or r4,r9,r4 > 10: 7c 63 2c 30 srw r3,r3,r5 See below. Such code is valid on PowerPC for all shift < 64, and a future version of GCC will do that (it is on various TODO lists, it is bound to happen *some* day ;-), but it won't help you yet of course). > +/* > + * The macros sets two stack frames, one for the caller and one for the callee > + * because there are no requirement for the caller to set a stack frame when > + * calling VDSO so it may have omitted to set one, especially on PPC64 > + */ If the caller follows the ABI, there always is a stack frame. So what is going on? > +/* > + * powerpc specific delta calculation. > + * > + * This variant removes the masking of the subtraction because the > + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX > + * which would result in a pointless operation. The compiler cannot > + * optimize it away as the mask comes from the vdso data and is not compile > + * time constant. > + */ It cannot optimise it because it does not know shift < 32. The code below is incorrect for shift equal to 32, fwiw. > +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) > +{ > + return (cycles - last) * mult; > +} > +#define vdso_calc_delta vdso_calc_delta > + > +#ifndef __powerpc64__ > +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift) > +{ > + u32 hi = ns >> 32; > + u32 lo = ns; > + > + lo >>= shift; > + lo |= hi << (32 - shift); > + hi >>= shift; > + if (likely(hi == 0)) > + return lo; Removing these two lines shouldn't change generated object code? Or not make it worse, at least. > + return ((u64)hi << 32) | lo; > +} What does the compiler do for just static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift) return ns >> (shift & 31); } ? Segher
WARNING: multiple messages have this Message-ID (diff)
From: Segher Boessenkool <segher@kernel.crashing.org> To: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>, Paul Mackerras <paulus@samba.org>, Michael Ellerman <mpe@ellerman.id.au>, nathanl@linux.ibm.com, anton@ozlabs.org, linux-arch@vger.kernel.org, arnd@arndb.de, linux-kernel@vger.kernel.org, luto@kernel.org, tglx@linutronix.de, vincenzo.frascino@arm.com, linuxppc-dev@lists.ozlabs.org Subject: Re: [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation. Date: Wed, 5 Aug 2020 09:03:07 -0500 [thread overview] Message-ID: <20200805140307.GO6753@gate.crashing.org> (raw) Message-ID: <20200805140307.ZYQS5rbR3gLGITp_jSiCW5MpmcQe4oRRb6tm_U-QHoQ@z> (raw) In-Reply-To: <348528c33cd4007f3fee7fe643ef160843d09a6c.1596611196.git.christophe.leroy@csgroup.eu> Hi! On Wed, Aug 05, 2020 at 07:09:23AM +0000, Christophe Leroy wrote: > Provide vdso_shift_ns(), as the generic x >> s gives the following > bad result: > > 18: 35 25 ff e0 addic. r9,r5,-32 > 1c: 41 80 00 10 blt 2c <shift+0x14> > 20: 7c 64 4c 30 srw r4,r3,r9 > 24: 38 60 00 00 li r3,0 > ... > 2c: 54 69 08 3c rlwinm r9,r3,1,0,30 > 30: 21 45 00 1f subfic r10,r5,31 > 34: 7c 84 2c 30 srw r4,r4,r5 > 38: 7d 29 50 30 slw r9,r9,r10 > 3c: 7c 63 2c 30 srw r3,r3,r5 > 40: 7d 24 23 78 or r4,r9,r4 > > In our case the shift is always <= 32. In addition, the upper 32 bits > of the result are likely nul. Lets GCC know it, it also optimises the > following calculations. > > With the patch, we get: > 0: 21 25 00 20 subfic r9,r5,32 > 4: 7c 69 48 30 slw r9,r3,r9 > 8: 7c 84 2c 30 srw r4,r4,r5 > c: 7d 24 23 78 or r4,r9,r4 > 10: 7c 63 2c 30 srw r3,r3,r5 See below. Such code is valid on PowerPC for all shift < 64, and a future version of GCC will do that (it is on various TODO lists, it is bound to happen *some* day ;-), but it won't help you yet of course). > +/* > + * The macros sets two stack frames, one for the caller and one for the callee > + * because there are no requirement for the caller to set a stack frame when > + * calling VDSO so it may have omitted to set one, especially on PPC64 > + */ If the caller follows the ABI, there always is a stack frame. So what is going on? > +/* > + * powerpc specific delta calculation. > + * > + * This variant removes the masking of the subtraction because the > + * clocksource mask of all VDSO capable clocksources on powerpc is U64_MAX > + * which would result in a pointless operation. The compiler cannot > + * optimize it away as the mask comes from the vdso data and is not compile > + * time constant. > + */ It cannot optimise it because it does not know shift < 32. The code below is incorrect for shift equal to 32, fwiw. > +static __always_inline u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult) > +{ > + return (cycles - last) * mult; > +} > +#define vdso_calc_delta vdso_calc_delta > + > +#ifndef __powerpc64__ > +static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift) > +{ > + u32 hi = ns >> 32; > + u32 lo = ns; > + > + lo >>= shift; > + lo |= hi << (32 - shift); > + hi >>= shift; > + if (likely(hi == 0)) > + return lo; Removing these two lines shouldn't change generated object code? Or not make it worse, at least. > + return ((u64)hi << 32) | lo; > +} What does the compiler do for just static __always_inline u64 vdso_shift_ns(u64 ns, unsigned long shift) return ns >> (shift & 31); } ? Segher
next prev parent reply other threads:[~2020-08-05 14:03 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-08-05 7:09 [PATCH v10 0/5] powerpc: switch VDSO to C implementation Christophe Leroy 2020-08-05 7:09 ` [PATCH v10 1/5] powerpc/processor: Move cpu_relax() into asm/vdso/processor.h Christophe Leroy 2020-08-05 7:09 ` [PATCH v10 2/5] powerpc/vdso: Prepare for switching VDSO to generic C implementation Christophe Leroy 2020-08-05 14:03 ` Segher Boessenkool [this message] 2020-08-05 14:03 ` Segher Boessenkool 2020-08-05 16:40 ` Christophe Leroy 2020-08-05 16:40 ` Christophe Leroy 2020-08-05 18:40 ` Segher Boessenkool 2020-08-05 18:40 ` Segher Boessenkool 2020-08-06 5:46 ` Christophe Leroy 2020-08-05 16:51 ` Christophe Leroy 2020-08-05 16:51 ` Christophe Leroy 2020-08-05 20:55 ` Segher Boessenkool 2020-08-05 7:09 ` [PATCH v10 3/5] powerpc/vdso: Save and restore TOC pointer on PPC64 Christophe Leroy 2020-08-05 7:09 ` [PATCH v10 4/5] powerpc/vdso: Switch VDSO to generic C implementation Christophe Leroy 2020-08-05 7:09 ` Christophe Leroy 2020-08-05 7:09 ` [PATCH v10 5/5] powerpc/vdso: Provide __kernel_clock_gettime64() on vdso32 Christophe Leroy 2020-08-05 7:09 ` Christophe Leroy
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200805140307.GO6753@gate.crashing.org \ --to=segher@kernel.crashing.org \ --cc=arnd@arndb.de \ --cc=christophe.leroy@csgroup.eu \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linuxppc-dev@lists.ozlabs.org \ --cc=luto@kernel.org \ --cc=nathanl@linux.ibm.com \ --cc=paulus@samba.org \ --cc=tglx@linutronix.de \ --cc=vincenzo.frascino@arm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).