* [RFC][PATCH 0/3] gcc work-around and math128 @ 2012-04-24 16:10 Peter Zijlstra 2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra ` (4 more replies) 0 siblings, 5 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 16:10 UTC (permalink / raw) To: linux-kernel, linux-arch; +Cc: Linus Torvalds, Andrew Morton, Juri Lelli Hi all, The SCHED_DEADLINE review resulted in the following three patches; The first is a cleanup of various copies of the same GCC loop optimization work-around. I don't think this patch is too controversial, at worst I've picked a wrong name, but I wanted to get it out there in case people know more sites. The second two implement a few u128 operations so we can do 128bit math.. I know a few people will die a little inside, but having nanosecond granularity time accounting leads to very big numbers very quickly and when you need to multiply them 64bit really isn't that much. So far its just mult, add and cmp for u128, no divisions quite yet. I realize that on 32bit archs mult_u128 will result in like 8 multiplication instructions that's still nowhere near the most expensive thing ever. Also, I suck at gcc inline asm, so patch 3/3 is very likely completely wrong, in any case none of the patches has even been near a compiler and this really is just an RFC to probe if we want to go here at all. ^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds 2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra @ 2012-04-24 16:10 ` Peter Zijlstra 2012-04-24 19:39 ` Linus Torvalds 2012-04-25 0:14 ` Stephen Rothwell 2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra ` (3 subsequent siblings) 4 siblings, 2 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 16:10 UTC (permalink / raw) To: linux-kernel, linux-arch Cc: Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner, Peter Zijlstra [-- Attachment #1: kernel-vs-gcc-while.patch --] [-- Type: text/plain, Size: 3383 bytes --] We've grown various copies of a particular gcc work-around, consolidate them into one and add a larger comment. Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- include/linux/compiler.h | 12 ++++++++++++ include/linux/math64.h | 4 +--- kernel/sched/core.c | 8 ++------ kernel/sched/fair.c | 8 ++------ kernel/time.c | 11 ++++------- 5 files changed, 21 insertions(+), 22 deletions(-) --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -310,4 +310,16 @@ void ftrace_likely_update(struct ftrace_ */ #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x)) +/* + * Avoid gcc loop optimization by clobbering a variable, forcing a reload + * and invalidating the optimization. + * + * The optimization in question transforms various loops into divisions/modulo + * operations, this is a problem when either the resulting operation generates + * unimplemented libgcc functions (u64 divisions for example) or the loop is + * known not to contain a lot of iterations and the division is in fact more + * expensive. + */ +#define __gcc_dont_optimize_loop(var) asm("" "+rm" (var)) + #endif /* __LINUX_COMPILER_H */ --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -105,9 +105,7 @@ __iter_div_u64_rem(u64 dividend, u32 div u32 ret = 0; while (dividend >= divisor) { - /* The following asm() prevents the compiler from - optimising this loop into a modulo operation. */ - asm("" : "+rm"(dividend)); + __gcc_dont_optimize_loop(dividend); dividend -= divisor; ret++; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -628,12 +628,8 @@ void sched_avg_update(struct rq *rq) s64 period = sched_avg_period(); while ((s64)(rq->clock - rq->age_stamp) > period) { - /* - * Inline assembly required to prevent the compiler - * optimising this loop into a divmod call. - * See __iter_div_u64_rem() for another example of this. - */ - asm("" : "+rm" (rq->age_stamp)); + __gcc_dont_optimize_loop(rq->age_stamp); + rq->age_stamp += period; rq->rt_avg /= 2; } --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -853,12 +853,8 @@ static void update_cfs_load(struct cfs_r update_cfs_rq_load_contribution(cfs_rq, global_update); while (cfs_rq->load_period > period) { - /* - * Inline assembly required to prevent the compiler - * optimising this loop into a divmod call. - * See __iter_div_u64_rem() for another example of this. - */ - asm("" : "+rm" (cfs_rq->load_period)); + __gcc_dont_optimize_loop(cfs_rq->load_period); + cfs_rq->load_period /= 2; cfs_rq->load_avg /= 2; } --- a/kernel/time.c +++ b/kernel/time.c @@ -349,17 +349,14 @@ EXPORT_SYMBOL(mktime); void set_normalized_timespec(struct timespec *ts, time_t sec, s64 nsec) { while (nsec >= NSEC_PER_SEC) { - /* - * The following asm() prevents the compiler from - * optimising this loop into a modulo operation. See - * also __iter_div_u64_rem() in include/linux/time.h - */ - asm("" : "+rm"(nsec)); + __gcc_dont_optimize_loop(nsec); + nsec -= NSEC_PER_SEC; ++sec; } while (nsec < 0) { - asm("" : "+rm"(nsec)); + __gcc_dont_optimize_loop(nsec); + nsec += NSEC_PER_SEC; --sec; } ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds 2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra @ 2012-04-24 19:39 ` Linus Torvalds 2012-04-25 0:14 ` Stephen Rothwell 1 sibling, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2012-04-24 19:39 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, Apr 24, 2012 at 9:10 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > +#define __gcc_dont_optimize_loop(var) asm("" "+rm" (var)) This seems to be missing a colon. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds 2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra 2012-04-24 19:39 ` Linus Torvalds @ 2012-04-25 0:14 ` Stephen Rothwell 1 sibling, 0 replies; 34+ messages in thread From: Stephen Rothwell @ 2012-04-25 0:14 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner [-- Attachment #1: Type: text/plain, Size: 852 bytes --] Hi Peter, On Tue, 24 Apr 2012 18:10:40 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > We've grown various copies of a particular gcc work-around, > consolidate them into one and add a larger comment. > > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > include/linux/compiler.h | 12 ++++++++++++ > include/linux/math64.h | 4 +--- > kernel/sched/core.c | 8 ++------ > kernel/sched/fair.c | 8 ++------ > kernel/time.c | 11 ++++------- You should, of course, include linux/compiler.h into those last 4 files ... -- Cheers, Stephen Rothwell sfr@canb.auug.org.au [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra 2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra @ 2012-04-24 16:10 ` Peter Zijlstra 2012-04-24 19:37 ` Linus Torvalds 2012-04-25 0:23 ` Stephen Rothwell 2012-04-24 16:10 ` [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm Peter Zijlstra ` (2 subsequent siblings) 4 siblings, 2 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 16:10 UTC (permalink / raw) To: linux-kernel, linux-arch Cc: Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner, Peter Zijlstra [-- Attachment #1: math128.patch --] [-- Type: text/plain, Size: 7794 bytes --] Grow rudimentary u128 support without relying on gcc/libgcc. Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- arch/alpha/include/asm/math128.h | 4 ++ arch/arm/include/asm/math128.h | 4 ++ arch/avr32/include/asm/math128.h | 4 ++ arch/blackfin/include/asm/math128.h | 4 ++ arch/c6x/include/asm/math128.h | 4 ++ arch/cris/include/asm/math128.h | 4 ++ arch/frv/include/asm/math128.h | 4 ++ arch/h8300/include/asm/math128.h | 4 ++ arch/hexagon/include/asm/math128.h | 4 ++ arch/ia64/include/asm/math128.h | 4 ++ arch/m32r/include/asm/math128.h | 4 ++ arch/m68k/include/asm/math128.h | 4 ++ arch/microblaze/include/asm/math128.h | 4 ++ arch/mips/include/asm/math128.h | 4 ++ arch/mn10300/include/asm/math128.h | 4 ++ arch/openrisc/include/asm/math128.h | 4 ++ arch/parisc/include/asm/math128.h | 4 ++ arch/powerpc/include/asm/math128.h | 4 ++ arch/s390/include/asm/math128.h | 4 ++ arch/score/include/asm/math128.h | 4 ++ arch/sh/include/asm/math128.h | 4 ++ arch/sparc/include/asm/math128.h | 4 ++ arch/tile/include/asm/math128.h | 4 ++ arch/um/include/asm/math128.h | 4 ++ arch/unicore32/include/asm/math128.h | 4 ++ arch/x86/include/asm/math128.h | 4 ++ arch/xtensa/include/asm/math128.h | 4 ++ include/linux/math128.h | 47 ++++++++++++++++++++++++++++++++++ lib/Makefile | 2 - lib/math128.c | 42 ++++++++++++++++++++++++++++++ 30 files changed, 198 insertions(+), 1 deletion(-) --- /dev/null +++ b/arch/alpha/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/arm/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/avr32/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/blackfin/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/c6x/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/cris/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/frv/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/h8300/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/hexagon/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/ia64/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/m32r/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/m68k/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/microblaze/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/mips/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/mn10300/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/openrisc/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/parisc/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/powerpc/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/s390/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/score/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/sh/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/sparc/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/tile/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/um/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/unicore32/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/x86/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/arch/xtensa/include/asm/math128.h @@ -0,0 +1,4 @@ +#ifndef _ASM_MATH128_H +#define _ASM_MATH128_H + +#endif /* _ASM_MATH128_H */ --- /dev/null +++ b/include/linux/math128.h @@ -0,0 +1,47 @@ +#ifndef _LINUX_MATH128_H +#define _LINUX_MATH128_H + +#include <linux/types.h> + +typedef struct { + u64 hi, lo; +} u128; + +#include <asm/math128.h> + +#ifndef add_u128 +static inline u128 add_u128(u128 a, u128 b) +{ + u128 res; + + res.hi = a.hi + b.hi; + res.lo = a.lo + b.lo; + + if (res.lo < a.lo || res.lo < b.lo) + res.hi++; + + return res; +} +#endif /* add_u128 */ + +#ifndef mult_u128 +extern int u128 mult_u128(u64 a, u64 b); +#endif + +#ifndef cmp_u128 +static inline int cmp_u128(u128 a, u128 b) +{ + if (a.hi > b.hi) + return 1; + if (a.hi < b.hi) + return -1; + if (a.lo > b.lo) + return 1; + if (a.lo < b.lo) + return -1; + + return 0; +} +#endif /* cmp_u128 */ + +#endif /* _LINUX_MATH128_H */ --- a/lib/Makefile +++ b/lib/Makefile @@ -12,7 +12,7 @@ lib-y := ctype.o string.o vsprintf.o cmd idr.o int_sqrt.o extable.o prio_tree.o \ sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \ proportions.o prio_heap.o ratelimit.o show_mem.o \ - is_single_threaded.o plist.o decompress.o + is_single_threaded.o plist.o decompress.o math128.o lib-$(CONFIG_MMU) += ioremap.o lib-$(CONFIG_SMP) += cpumask.o --- /dev/null +++ b/lib/math128.c @@ -0,0 +1,42 @@ +#include <linux/math128.h> + +#ifndef mult_u128 +/* + * a * b = (ah * 2^32 + al) * (bh * 2^32 + bl) = + * ah*bh * 2^64 + (ah*bl + bh*al) * 2^32 + al*bl + */ +u128 mult_u128(u64 a, u64 b) +{ + u128 res; + u64 ah, al; + u64 bh, bl; + u128 t1, t2, t3, t4; + + ah = a >> 32; + al = a & ((1ULL << 32) - 1); + + bh = b >> 32; + bl = b & ((1ULL << 32) - 1); + + t1.lo = 0; + t1.hi = ah * bh; + + t2.lo = ah * bl; + t2.hi = t2.lo >> 32; + t2.lo <<= 32; + + t3.lo = al * bh; + t3.hi = t3.lo >> 32; + t3.lo <<= 32; + + t4.lo = al * bl; + t4.hi = 0; + + res = add_u128(t1, t2); + res = add_u128(res, t3); + res = add_u128(res, t4); + + return res; +} +#endif /* mult_u128 */ + ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra @ 2012-04-24 19:37 ` Linus Torvalds 2012-04-24 19:43 ` Peter Zijlstra ` (2 more replies) 2012-04-25 0:23 ` Stephen Rothwell 1 sibling, 3 replies; 34+ messages in thread From: Linus Torvalds @ 2012-04-24 19:37 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, Apr 24, 2012 at 9:10 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > Grow rudimentary u128 support without relying on gcc/libgcc. > > +#ifndef add_u128 > +static inline u128 add_u128(u128 a, u128 b) > +{ > + u128 res; > + > + res.hi = a.hi + b.hi; > + res.lo = a.lo + b.lo; > + > + if (res.lo < a.lo || res.lo < b.lo) > + res.hi++; This is wrong. Or at least stupid. Just do one of the comparisons, not both. If overflow occurs, the result will be smaller than *either* of the added numbers, so comparing both is just silly and confused. So just pick one. Also, it might be worth looking at code generation, to see if it's better to just do a.hi += b.hi; a.low += b.low; if (a.low < b.low) a.hi++; return a; because that might make it clear that there are fewer actual values live at any particular time. But gcc may not care. Try it. Also, for the multiply, please make sure gcc knows to do a "32x32->64" multiplication, rather than thinking it needs to do full 64x64 multiplies.. I'm not sure gcc understands that as you wrote it. You are probably better off actually using 32-bit values, and then an explicit cast, ie u32 a32_0 = .. low 32 bits of a .. u32 b32_0 = .. low 32 bits of b .. u64 res64_0 = (u64) a32_0 * (u64) b32_0; but if gcc understands it from the shifts and masks, I guess it doesn't matter. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-24 19:37 ` Linus Torvalds @ 2012-04-24 19:43 ` Peter Zijlstra 2012-04-24 20:12 ` Måns Rullgård 2012-04-24 21:54 ` Peter Zijlstra 2 siblings, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 19:43 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, 2012-04-24 at 12:37 -0700, Linus Torvalds wrote: > Or at least stupid. > I'll take stupid.. this bit-fiddling isn't something I've done lots of. I'll go stare at gcc output now.. ;-) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-24 19:37 ` Linus Torvalds 2012-04-24 19:43 ` Peter Zijlstra @ 2012-04-24 20:12 ` Måns Rullgård 2012-04-24 21:54 ` Peter Zijlstra 2 siblings, 0 replies; 34+ messages in thread From: Måns Rullgård @ 2012-04-24 20:12 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Zijlstra, linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner Linus Torvalds <torvalds@linux-foundation.org> writes: > On Tue, Apr 24, 2012 at 9:10 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: >> Grow rudimentary u128 support without relying on gcc/libgcc. >> >> +#ifndef add_u128 >> +static inline u128 add_u128(u128 a, u128 b) >> +{ >> + u128 res; >> + >> + res.hi = a.hi + b.hi; >> + res.lo = a.lo + b.lo; >> + >> + if (res.lo < a.lo || res.lo < b.lo) >> + res.hi++; > > This is wrong. Or at least stupid. > > Just do one of the comparisons, not both. If overflow occurs, the > result will be smaller than *either* of the added numbers, so > comparing both is just silly and confused. > > So just pick one. > > Also, it might be worth looking at code generation, to see if it's > better to just do > > a.hi += b.hi; > a.low += b.low; > if (a.low < b.low) > a.hi++; > return a; I have no idea if it makes a difference, but that if statement can be written as a.hi += a.low < b.low. Just an observation. > because that might make it clear that there are fewer actual values > live at any particular time. But gcc may not care. Try it. > > Also, for the multiply, please make sure gcc knows to do a "32x32->64" > multiplication, rather than thinking it needs to do full 64x64 > multiplies.. On ARM it does the right thing at least since 4.3, which is the oldest ARM compiler I have at hand. -- Måns Rullgård mans@mansr.com ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-24 19:37 ` Linus Torvalds 2012-04-24 19:43 ` Peter Zijlstra 2012-04-24 20:12 ` Måns Rullgård @ 2012-04-24 21:54 ` Peter Zijlstra 2012-04-25 0:09 ` H. Peter Anvin 2012-04-25 1:46 ` Linus Torvalds 2 siblings, 2 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 21:54 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, 2012-04-24 at 12:37 -0700, Linus Torvalds wrote: > Also, it might be worth looking at code generation, to see if it's > better to just do > > a.hi += b.hi; > a.low += b.low; > if (a.low < b.low) > a.hi++; > return a; > > because that might make it clear that there are fewer actual values > live at any particular time. But gcc may not care. Try it. It does indeed generate tons better code. FWIW, Mans' suggestion of: a.hi += a.lo < b.lo; horribly confuses gcc. > Also, for the multiply, please make sure gcc knows to do a "32x32->64" > multiplication, rather than thinking it needs to do full 64x64 > multiplies.. > > I'm not sure gcc understands that as you wrote it. It does indeed grok it (as Mans also confirmed for ARM), however: > You are probably > better off actually using 32-bit values, and then an explicit cast, ie > > u32 a32_0 = .. low 32 bits of a .. > u32 b32_0 = .. low 32 bits of b .. > u64 res64_0 = (u64) a32_0 * (u64) b32_0; > > but if gcc understands it from the shifts and masks, I guess it doesn't matter. that does generate slightly better code in that it avoids some masks on 64bit: @@ -7,12 +7,11 @@ .LFB38: .cfi_startproc movq %rdi, %r8 - movq %rdi, %rdx movq %rsi, %rcx + mov %edi, %edx shrq $32, %r8 - andl $4294967295, %edx shrq $32, %rcx - andl $4294967295, %esi + mov %esi, %esi movq %rcx, %rax imulq %rdx, %rcx imulq %rsi, %rdx ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-24 21:54 ` Peter Zijlstra @ 2012-04-25 0:09 ` H. Peter Anvin 2012-04-25 1:46 ` Linus Torvalds 1 sibling, 0 replies; 34+ messages in thread From: H. Peter Anvin @ 2012-04-25 0:09 UTC (permalink / raw) To: Peter Zijlstra Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On 04/24/2012 02:54 PM, Peter Zijlstra wrote: > - andl $4294967295, %edx > shrq $32, %rcx > - andl $4294967295, %esi > + mov %esi, %esi If you can *ever* get gcc to generate those andl instructions on x86, then please file a gcc bug report: the constant is 0xffffffff and those are plain zero-extension instructions which is much better done with mov. -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-24 21:54 ` Peter Zijlstra 2012-04-25 0:09 ` H. Peter Anvin @ 2012-04-25 1:46 ` Linus Torvalds 2012-04-25 8:35 ` Peter Zijlstra 2012-04-25 10:13 ` Peter Zijlstra 1 sibling, 2 replies; 34+ messages in thread From: Linus Torvalds @ 2012-04-25 1:46 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, Apr 24, 2012 at 2:54 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > that does generate slightly better code in that it avoids some masks on > 64bit: > > @@ -7,12 +7,11 @@ > .LFB38: > .cfi_startproc > movq %rdi, %r8 > - movq %rdi, %rdx > movq %rsi, %rcx > + mov %edi, %edx > shrq $32, %r8 > - andl $4294967295, %edx > shrq $32, %rcx > - andl $4294967295, %esi > + mov %esi, %esi Oh christ. What insane version of gcc is that? Can you please make a gcc bug-report? Because a compiler that generates an instruction sequence like movq %rdi,%rsi andl $4294967295, %esi is just so fricking stupid that it's outright buggy. That's just crazy. It's demented. It's an "and" with all bits set. But yeah, I do think that in general using a cast to 32-bit instead of a mask to 32-bit is easier for the compiler. Although that still is a particularly stupid code sequence to use. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-25 1:46 ` Linus Torvalds @ 2012-04-25 8:35 ` Peter Zijlstra 2012-04-25 10:13 ` Peter Zijlstra 1 sibling, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-25 8:35 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, 2012-04-24 at 18:46 -0700, Linus Torvalds wrote: > What insane version of gcc is that? Can you please make a gcc bug-report? gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1 I'll go and build me a gcc.git 4.6 and 4.7 branch compiler to see if its still there. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-25 1:46 ` Linus Torvalds 2012-04-25 8:35 ` Peter Zijlstra @ 2012-04-25 10:13 ` Peter Zijlstra 1 sibling, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-25 10:13 UTC (permalink / raw) To: Linus Torvalds Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, 2012-04-24 at 18:46 -0700, Linus Torvalds wrote: > Can you please make a gcc bug-report? > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53110 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra 2012-04-24 19:37 ` Linus Torvalds @ 2012-04-25 0:23 ` Stephen Rothwell 2012-04-25 8:11 ` Peter Zijlstra 1 sibling, 1 reply; 34+ messages in thread From: Stephen Rothwell @ 2012-04-25 0:23 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner [-- Attachment #1: Type: text/plain, Size: 2319 bytes --] Hi Peter, On Tue, 24 Apr 2012 18:10:41 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > Grow rudimentary u128 support without relying on gcc/libgcc. > > Cc: Ingo Molnar <mingo@kernel.org> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > arch/alpha/include/asm/math128.h | 4 ++ > arch/arm/include/asm/math128.h | 4 ++ > arch/avr32/include/asm/math128.h | 4 ++ > arch/blackfin/include/asm/math128.h | 4 ++ > arch/c6x/include/asm/math128.h | 4 ++ > arch/cris/include/asm/math128.h | 4 ++ > arch/frv/include/asm/math128.h | 4 ++ > arch/h8300/include/asm/math128.h | 4 ++ > arch/hexagon/include/asm/math128.h | 4 ++ > arch/ia64/include/asm/math128.h | 4 ++ > arch/m32r/include/asm/math128.h | 4 ++ > arch/m68k/include/asm/math128.h | 4 ++ > arch/microblaze/include/asm/math128.h | 4 ++ > arch/mips/include/asm/math128.h | 4 ++ > arch/mn10300/include/asm/math128.h | 4 ++ > arch/openrisc/include/asm/math128.h | 4 ++ > arch/parisc/include/asm/math128.h | 4 ++ > arch/powerpc/include/asm/math128.h | 4 ++ > arch/s390/include/asm/math128.h | 4 ++ > arch/score/include/asm/math128.h | 4 ++ > arch/sh/include/asm/math128.h | 4 ++ > arch/sparc/include/asm/math128.h | 4 ++ > arch/tile/include/asm/math128.h | 4 ++ > arch/um/include/asm/math128.h | 4 ++ > arch/unicore32/include/asm/math128.h | 4 ++ > arch/x86/include/asm/math128.h | 4 ++ > arch/xtensa/include/asm/math128.h | 4 ++ > include/linux/math128.h | 47 ++++++++++++++++++++++++++++++++++ Isn't this what asm-generic is for? You put the generic version in asm-generic/math128.h and then add "generic-y += math128.h" to arch/*/include/asm/Kbuild for each arch that wants the generic version. Then just include asm/math128.h in linux/math128.h. Any arch that wants to optimise these can then just provide its own math128.h. -- Cheers, Stephen Rothwell sfr@canb.auug.org.au [-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-25 0:23 ` Stephen Rothwell @ 2012-04-25 8:11 ` Peter Zijlstra 2012-04-25 11:23 ` Geert Uytterhoeven 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2012-04-25 8:11 UTC (permalink / raw) To: Stephen Rothwell Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Wed, 2012-04-25 at 10:23 +1000, Stephen Rothwell wrote: > Isn't this what asm-generic is for? You put the generic version in > asm-generic/math128.h and then add "generic-y += math128.h" to > arch/*/include/asm/Kbuild for each arch that wants the generic version. > Then just include asm/math128.h in linux/math128.h. Any arch that wants > to optimise these can then just provide its own math128.h. Ah, right. I didn't know that Kbuild trick. /me does: ls arch/*/include/asm/Kbuild | while read file; do quilt add $file; echo "generic-y += math128.h" >> $file; done Thanks! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-25 8:11 ` Peter Zijlstra @ 2012-04-25 11:23 ` Geert Uytterhoeven 2012-04-25 11:58 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Geert Uytterhoeven @ 2012-04-25 11:23 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephen Rothwell, linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Wed, Apr 25, 2012 at 10:11, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > ls arch/*/include/asm/Kbuild | while read file; do quilt add $file; > echo "generic-y += math128.h" >> $file; done That may fail if your "ls" is an alias. Better: "for file in arch/*/include/asm/Kbuild;do ..; done". Still using quilt? I didn't regret ditching it in favor of git (yummy, git rebase ;-). Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-25 11:23 ` Geert Uytterhoeven @ 2012-04-25 11:58 ` Peter Zijlstra 2012-04-25 14:35 ` Geert Uytterhoeven 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2012-04-25 11:58 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Stephen Rothwell, linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Wed, 2012-04-25 at 13:23 +0200, Geert Uytterhoeven wrote: > On Wed, Apr 25, 2012 at 10:11, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > > ls arch/*/include/asm/Kbuild | while read file; do quilt add $file; > > echo "generic-y += math128.h" >> $file; done > > That may fail if your "ls" is an alias. > Better: "for file in arch/*/include/asm/Kbuild;do ..; done". Agreed, still ls worked and the patch is send.. > Still using quilt? I didn't regret ditching it in favor of git (yummy, > git rebase ;-). Yeah, I can't seem to deal with git, I edit my patches by hand way too often, also quilt is a _LOT_ faster pushing/popping patches than git is (and yes I use an SSD). Not using git also means I don't have to deal with the trainwreck that is git-merge,.. /me runs like hell. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-25 11:58 ` Peter Zijlstra @ 2012-04-25 14:35 ` Geert Uytterhoeven 2012-04-25 15:09 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Geert Uytterhoeven @ 2012-04-25 14:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Stephen Rothwell, linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Wed, Apr 25, 2012 at 13:58, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: >> Still using quilt? I didn't regret ditching it in favor of git (yummy, >> git rebase ;-). > > Yeah, I can't seem to deal with git, I edit my patches by hand way too > often, also quilt is a _LOT_ faster pushing/popping patches than git is > (and yes I use an SSD). > > Not using git also means I don't have to deal with the trainwreck that > is git-merge,.. /me runs like hell. But you have to deal with the quilt-oh-no-the-patch-no-longer-applies trainwreck... Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 2012-04-25 14:35 ` Geert Uytterhoeven @ 2012-04-25 15:09 ` Peter Zijlstra 0 siblings, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-25 15:09 UTC (permalink / raw) To: Geert Uytterhoeven Cc: Stephen Rothwell, linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Wed, 2012-04-25 at 16:35 +0200, Geert Uytterhoeven wrote: > On Wed, Apr 25, 2012 at 13:58, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > >> Still using quilt? I didn't regret ditching it in favor of git (yummy, > >> git rebase ;-). > > > > Yeah, I can't seem to deal with git, I edit my patches by hand way too > > often, also quilt is a _LOT_ faster pushing/popping patches than git is > > (and yes I use an SSD). > > > > Not using git also means I don't have to deal with the trainwreck that > > is git-merge,.. /me runs like hell. > > But you have to deal with the quilt-oh-no-the-patch-no-longer-applies > trainwreck... Yeah, but I find it much easier to deal with .rej files than with the stuff left behind by git. ^ permalink raw reply [flat|nested] 34+ messages in thread
* [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm 2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra 2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra 2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra @ 2012-04-24 16:10 ` Peter Zijlstra 2012-04-24 16:34 ` H. Peter Anvin 2012-04-24 17:22 ` [RFC][PATCH 0/3] gcc work-around and math128 H. Peter Anvin 2012-04-24 21:15 ` Andy Lutomirski 4 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 16:10 UTC (permalink / raw) To: linux-kernel, linux-arch Cc: Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra [-- Attachment #1: math128-x86_64.patch --] [-- Type: text/plain, Size: 1020 bytes --] Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- arch/x86/include/asm/math128.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) --- a/arch/x86/include/asm/math128.h +++ b/arch/x86/include/asm/math128.h @@ -1,4 +1,33 @@ #ifndef _ASM_MATH128_H #define _ASM_MATH128_H +#ifdef CONFIG_X86_64 + +static inline mult_u128(u64 a, u64 b) +{ + u128 res; + + asm("mulq %2" + : "=a" (res.lo), "=d" (res.hi) + : "rm" (b), "0" (a)); + + return res; +} +#define mult_u128 mult_u128 + +static inline add_u128(u128 a, u128 b) +{ + u128 res; + + asm("addq %2,%0;\n" + "adcq %3,%1;\n" + : "=rm" (res.lo), "=rm" (res.hi) + : "r" (b.lo), "r" (b.hi), "0" (a.lo), "1" (a.hi)); + + return res; +} +#define add_u128 add_u128 + +#endif /* CONFIG_X86_64 */ + #endif /* _ASM_MATH128_H */ ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm 2012-04-24 16:10 ` [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm Peter Zijlstra @ 2012-04-24 16:34 ` H. Peter Anvin 2012-04-24 16:36 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: H. Peter Anvin @ 2012-04-24 16:34 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On 04/24/2012 09:10 AM, Peter Zijlstra wrote: Are you sure that gcc won't simply generate this code already if given "unsigned __int128"? -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm 2012-04-24 16:34 ` H. Peter Anvin @ 2012-04-24 16:36 ` Peter Zijlstra 2012-04-24 17:17 ` H. Peter Anvin 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 16:36 UTC (permalink / raw) To: H. Peter Anvin Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, 2012-04-24 at 09:34 -0700, H. Peter Anvin wrote: > On 04/24/2012 09:10 AM, Peter Zijlstra wrote: > > Are you sure that gcc won't simply generate this code already if given > "unsigned __int128"? It might, but I've been told some GCC versions had issues, also lots of archs fall back to libgcc functions. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm 2012-04-24 16:36 ` Peter Zijlstra @ 2012-04-24 17:17 ` H. Peter Anvin 2012-04-24 17:19 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: H. Peter Anvin @ 2012-04-24 17:17 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On 04/24/2012 09:36 AM, Peter Zijlstra wrote: > On Tue, 2012-04-24 at 09:34 -0700, H. Peter Anvin wrote: >> On 04/24/2012 09:10 AM, Peter Zijlstra wrote: >> >> Are you sure that gcc won't simply generate this code already if given >> "unsigned __int128"? > > It might, but I've been told some GCC versions had issues, also lots of > archs fall back to libgcc functions. > Yes, but this is specific to x86-64. If gcc can generate the code it can also schedule the code, which it can't for assembly... -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm 2012-04-24 17:17 ` H. Peter Anvin @ 2012-04-24 17:19 ` Peter Zijlstra 2012-04-24 17:20 ` H. Peter Anvin 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 17:19 UTC (permalink / raw) To: H. Peter Anvin Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, 2012-04-24 at 10:17 -0700, H. Peter Anvin wrote: > On 04/24/2012 09:36 AM, Peter Zijlstra wrote: > > On Tue, 2012-04-24 at 09:34 -0700, H. Peter Anvin wrote: > >> On 04/24/2012 09:10 AM, Peter Zijlstra wrote: > >> > >> Are you sure that gcc won't simply generate this code already if given > >> "unsigned __int128"? > > > > It might, but I've been told some GCC versions had issues, also lots of > > archs fall back to libgcc functions. > > > > Yes, but this is specific to x86-64. If gcc can generate the code it > can also schedule the code, which it can't for assembly... Ok, but are you sure unsigned __int128 works for all gcc versions we support? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm 2012-04-24 17:19 ` Peter Zijlstra @ 2012-04-24 17:20 ` H. Peter Anvin 2012-04-24 22:00 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: H. Peter Anvin @ 2012-04-24 17:20 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On 04/24/2012 10:19 AM, Peter Zijlstra wrote: >> >> Yes, but this is specific to x86-64. If gcc can generate the code it >> can also schedule the code, which it can't for assembly... > > Ok, but are you sure unsigned __int128 works for all gcc versions we > support? > I have been told __int128 works for all versions of gcc which support x86-64, but I haven't verified it myself, no. -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm 2012-04-24 17:20 ` H. Peter Anvin @ 2012-04-24 22:00 ` Peter Zijlstra 2012-04-24 22:03 ` H. Peter Anvin 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 22:00 UTC (permalink / raw) To: H. Peter Anvin Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On Tue, 2012-04-24 at 10:20 -0700, H. Peter Anvin wrote: > I have been told __int128 works for all versions of gcc which support > x86-64, but I haven't verified it myself, no. The gcc version I have seems to grok it: u128 mult_u64_u128(u64 a, u64 b) { unsigned __int128 t = a; t *= b; return U128_INIT(t >> 64, t); } #define mult_u64_u128 mult_u64_u128 generates: .p2align 4,,15 .globl mult_u64_u128 .type mult_u64_u128, @function mult_u64_u128: .LFB38: .cfi_startproc movq %rsi, %rax pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 mulq %rdi popq %rbp .cfi_def_cfa_offset 8 movq %rax, %rdi movq %rdx, %rax movq %rdi, %rdx ret .cfi_endproc .LFE38: .size mult_u64_u128, .-mult_u64_u128 I haven't been able to make add_u128 generate sane code using __int128, the conversion between my struct and the __int128 layout makes a horrid mess. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm 2012-04-24 22:00 ` Peter Zijlstra @ 2012-04-24 22:03 ` H. Peter Anvin 0 siblings, 0 replies; 34+ messages in thread From: H. Peter Anvin @ 2012-04-24 22:03 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner On 04/24/2012 03:00 PM, Peter Zijlstra wrote: > > I haven't been able to make add_u128 generate sane code using __int128, > the conversion between my struct and the __int128 layout makes a horrid > mess. > Use a union, or define your type simply as __int128 if you're on a suitable platform? -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 0/3] gcc work-around and math128 2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra ` (2 preceding siblings ...) 2012-04-24 16:10 ` [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm Peter Zijlstra @ 2012-04-24 17:22 ` H. Peter Anvin 2012-04-24 17:27 ` H. Peter Anvin 2012-04-24 21:15 ` Andy Lutomirski 4 siblings, 1 reply; 34+ messages in thread From: H. Peter Anvin @ 2012-04-24 17:22 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli On 04/24/2012 09:10 AM, Peter Zijlstra wrote: > > So far its just mult, add and cmp for u128, no divisions quite yet. I realize > that on 32bit archs mult_u128 will result in like 8 multiplication > instructions that's still nowhere near the most expensive thing ever. > Please add shifts, at the very least. -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 0/3] gcc work-around and math128 2012-04-24 17:22 ` [RFC][PATCH 0/3] gcc work-around and math128 H. Peter Anvin @ 2012-04-24 17:27 ` H. Peter Anvin 0 siblings, 0 replies; 34+ messages in thread From: H. Peter Anvin @ 2012-04-24 17:27 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli On 04/24/2012 10:22 AM, H. Peter Anvin wrote: > On 04/24/2012 09:10 AM, Peter Zijlstra wrote: >> >> So far its just mult, add and cmp for u128, no divisions quite yet. I realize >> that on 32bit archs mult_u128 will result in like 8 multiplication >> instructions that's still nowhere near the most expensive thing ever. >> > > Please add shifts, at the very least. > Oh, and a constant constructor. -hpa ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 0/3] gcc work-around and math128 2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra ` (3 preceding siblings ...) 2012-04-24 17:22 ` [RFC][PATCH 0/3] gcc work-around and math128 H. Peter Anvin @ 2012-04-24 21:15 ` Andy Lutomirski 2012-04-24 21:18 ` Linus Torvalds 2012-04-24 21:32 ` Peter Zijlstra 4 siblings, 2 replies; 34+ messages in thread From: Andy Lutomirski @ 2012-04-24 21:15 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli On 04/24/2012 09:10 AM, Peter Zijlstra wrote: > Hi all, > > The SCHED_DEADLINE review resulted in the following three patches; > > The first is a cleanup of various copies of the same GCC loop optimization > work-around. I don't think this patch is too controversial, at worst I've > picked a wrong name, but I wanted to get it out there in case people > know more sites. > > The second two implement a few u128 operations so we can do 128bit math.. I > know a few people will die a little inside, but having nanosecond granularity > time accounting leads to very big numbers very quickly and when you need to > multiply them 64bit really isn't that much. I played with some of this stuff awhile ago, and for timekeeping, it seemed like a 64x32->96 bit multiply followed by a right shift was enough, and that operation is a lot faster on 32-bit architectures than a full 64x64->128 multiply. Something like: uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift) { return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift ); } or (untested, but compilable 32-bit gcc) uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift) { uint64_t part1 = ((a & 0xFFFFFFFFULL) * mult) >> shift; uint64_t part2 = ((a >> 32) * mult) << (32 - shift); return part1 + part2; } --Andy ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 0/3] gcc work-around and math128 2012-04-24 21:15 ` Andy Lutomirski @ 2012-04-24 21:18 ` Linus Torvalds 2012-04-24 21:32 ` Peter Zijlstra 1 sibling, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2012-04-24 21:18 UTC (permalink / raw) To: Andy Lutomirski Cc: Peter Zijlstra, linux-kernel, linux-arch, Andrew Morton, Juri Lelli On Tue, Apr 24, 2012 at 2:15 PM, Andy Lutomirski <luto@amacapital.net> wrote: > > I played with some of this stuff awhile ago, and for timekeeping, it > seemed like a 64x32->96 bit multiply followed by a right shift was > enough, and that operation is a lot faster on 32-bit architectures than > a full 64x64->128 multiply. Ack. That may sound like odd numbers, but 64x32->96 sounds sane. And I think it avoids a multiply even on 64-bit, no? Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 0/3] gcc work-around and math128 2012-04-24 21:15 ` Andy Lutomirski 2012-04-24 21:18 ` Linus Torvalds @ 2012-04-24 21:32 ` Peter Zijlstra 2012-04-24 21:35 ` Andy Lutomirski 1 sibling, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 21:32 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli On Tue, 2012-04-24 at 14:15 -0700, Andy Lutomirski wrote: > > The second two implement a few u128 operations so we can do 128bit math.. I > > know a few people will die a little inside, but having nanosecond granularity > > time accounting leads to very big numbers very quickly and when you need to > > multiply them 64bit really isn't that much. > > I played with some of this stuff awhile ago, and for timekeeping, it > seemed like a 64x32->96 bit multiply followed by a right shift was > enough, and that operation is a lot faster on 32-bit architectures than > a full 64x64->128 multiply. The SCHED_DEADLINE use case is not that, it multiplies two time intervals. Basically it needs to evaluate if a task activation still fits in the old period or if it needs to shift the deadline and start a new period. It needs to do: runtime / (deadline - t) < budget / period which transforms into: (deadline - t) * period < budget * runtime hence the 64x64->128 mult and 128 compare. > Something like: > > uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift) > { > return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift ); > } That looks a lot like what we grew mult_frac() for, it does: /* * Multiplies an integer by a fraction, while avoiding unnecessary * overflow or loss of precision. */ #define mult_frac(x, numer, denom)( \ { \ typeof(x) quot = (x) / (denom); \ typeof(x) rem = (x) % (denom); \ (quot * (numer)) + ((rem * (numer)) / (denom)); \ } \ ) and is used in __cycles_2_ns() and friends. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 0/3] gcc work-around and math128 2012-04-24 21:32 ` Peter Zijlstra @ 2012-04-24 21:35 ` Andy Lutomirski 2012-04-24 21:51 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Andy Lutomirski @ 2012-04-24 21:35 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli On Tue, Apr 24, 2012 at 2:32 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote: > On Tue, 2012-04-24 at 14:15 -0700, Andy Lutomirski wrote: >> > The second two implement a few u128 operations so we can do 128bit math.. I >> > know a few people will die a little inside, but having nanosecond granularity >> > time accounting leads to very big numbers very quickly and when you need to >> > multiply them 64bit really isn't that much. >> >> I played with some of this stuff awhile ago, and for timekeeping, it >> seemed like a 64x32->96 bit multiply followed by a right shift was >> enough, and that operation is a lot faster on 32-bit architectures than >> a full 64x64->128 multiply. > > The SCHED_DEADLINE use case is not that, it multiplies two time > intervals. Basically it needs to evaluate if a task activation still > fits in the old period or if it needs to shift the deadline and start a > new period. > > It needs to do: runtime / (deadline - t) < budget / period > which transforms into: (deadline - t) * period < budget * runtime > > hence the 64x64->128 mult and 128 compare. Fair enough. > >> Something like: >> >> uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift) >> { >> return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift ); >> } > > That looks a lot like what we grew mult_frac() for, it does: > > /* > * Multiplies an integer by a fraction, while avoiding unnecessary > * overflow or loss of precision. > */ > #define mult_frac(x, numer, denom)( \ > { \ > typeof(x) quot = (x) / (denom); \ > typeof(x) rem = (x) % (denom); \ > (quot * (numer)) + ((rem * (numer)) / (denom)); \ > } \ > ) > > > and is used in __cycles_2_ns() and friends. Yeesh. That looks way slower, and IIRC __cycles_2_ns overflows every few seconds on modern machines. gcc 4.6 generates this code: mul_64_32_shift: pushq %rbp movq %rsp, %rbp movl %edx, %ecx movl %esi, %eax mulq %rdi movq %rdx, %rsi shrq %cl, %rsi shrdq %cl, %rdx, %rax testb $64, %cl cmovneq %rsi, %rax popq %rbp ret which is a bit dumb if you can make assumptions about the shift. See http://gcc.gnu.org/PR46514. Some use cases might be able to guarantee that the shift is less than 32 bits, in which case hand-written assembly would be a few cycles faster. --Andy ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [RFC][PATCH 0/3] gcc work-around and math128 2012-04-24 21:35 ` Andy Lutomirski @ 2012-04-24 21:51 ` Peter Zijlstra 0 siblings, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2012-04-24 21:51 UTC (permalink / raw) To: Andy Lutomirski Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli On Tue, 2012-04-24 at 14:35 -0700, Andy Lutomirski wrote: > Yeesh. That looks way slower. Indeed, pulled it through gcc and saw what was generated with mult_frac :/ ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2012-04-25 15:10 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra 2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra 2012-04-24 19:39 ` Linus Torvalds 2012-04-25 0:14 ` Stephen Rothwell 2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra 2012-04-24 19:37 ` Linus Torvalds 2012-04-24 19:43 ` Peter Zijlstra 2012-04-24 20:12 ` Måns Rullgård 2012-04-24 21:54 ` Peter Zijlstra 2012-04-25 0:09 ` H. Peter Anvin 2012-04-25 1:46 ` Linus Torvalds 2012-04-25 8:35 ` Peter Zijlstra 2012-04-25 10:13 ` Peter Zijlstra 2012-04-25 0:23 ` Stephen Rothwell 2012-04-25 8:11 ` Peter Zijlstra 2012-04-25 11:23 ` Geert Uytterhoeven 2012-04-25 11:58 ` Peter Zijlstra 2012-04-25 14:35 ` Geert Uytterhoeven 2012-04-25 15:09 ` Peter Zijlstra 2012-04-24 16:10 ` [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm Peter Zijlstra 2012-04-24 16:34 ` H. Peter Anvin 2012-04-24 16:36 ` Peter Zijlstra 2012-04-24 17:17 ` H. Peter Anvin 2012-04-24 17:19 ` Peter Zijlstra 2012-04-24 17:20 ` H. Peter Anvin 2012-04-24 22:00 ` Peter Zijlstra 2012-04-24 22:03 ` H. Peter Anvin 2012-04-24 17:22 ` [RFC][PATCH 0/3] gcc work-around and math128 H. Peter Anvin 2012-04-24 17:27 ` H. Peter Anvin 2012-04-24 21:15 ` Andy Lutomirski 2012-04-24 21:18 ` Linus Torvalds 2012-04-24 21:32 ` Peter Zijlstra 2012-04-24 21:35 ` Andy Lutomirski 2012-04-24 21:51 ` Peter Zijlstra
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.