All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/3] gcc work-around and math128
@ 2012-04-24 16:10 Peter Zijlstra
  2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra
                   ` (4 more replies)
  0 siblings, 5 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 16:10 UTC (permalink / raw)
  To: linux-kernel, linux-arch; +Cc: Linus Torvalds, Andrew Morton, Juri Lelli

Hi all,

The SCHED_DEADLINE review resulted in the following three patches;

The first is a cleanup of various copies of the same GCC loop optimization
work-around. I don't think this patch is too controversial, at worst I've
picked a wrong name, but I wanted to get it out there in case people
know more sites.

The second two implement a few u128 operations so we can do 128bit math.. I
know a few people will die a little inside, but having nanosecond granularity
time accounting leads to very big numbers very quickly and when you need to
multiply them 64bit really isn't that much.

So far its just mult, add and cmp for u128, no divisions quite yet. I realize
that on 32bit archs mult_u128 will result in like 8 multiplication
instructions that's still nowhere near the most expensive thing ever.

Also, I suck at gcc inline asm, so patch 3/3 is very likely completely wrong,
in any case none of the patches has even been near a compiler and this really
is just an RFC to probe if we want to go here at all.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds
  2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra
@ 2012-04-24 16:10 ` Peter Zijlstra
  2012-04-24 19:39   ` Linus Torvalds
  2012-04-25  0:14   ` Stephen Rothwell
  2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 16:10 UTC (permalink / raw)
  To: linux-kernel, linux-arch
  Cc: Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra

[-- Attachment #1: kernel-vs-gcc-while.patch --]
[-- Type: text/plain, Size: 3383 bytes --]

We've grown various copies of a particular gcc work-around,
consolidate them into one and add a larger comment.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/compiler.h |   12 ++++++++++++
 include/linux/math64.h   |    4 +---
 kernel/sched/core.c      |    8 ++------
 kernel/sched/fair.c      |    8 ++------
 kernel/time.c            |   11 ++++-------
 5 files changed, 21 insertions(+), 22 deletions(-)

--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -310,4 +310,16 @@ void ftrace_likely_update(struct ftrace_
  */
 #define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
 
+/*
+ * Avoid gcc loop optimization by clobbering a variable, forcing a reload
+ * and invalidating the optimization.
+ *
+ * The optimization in question transforms various loops into divisions/modulo
+ * operations, this is a problem when either the resulting operation generates
+ * unimplemented libgcc functions (u64 divisions for example) or the loop is
+ * known not to contain a lot of iterations and the division is in fact more
+ * expensive.
+ */
+#define __gcc_dont_optimize_loop(var) asm("" "+rm" (var))
+
 #endif /* __LINUX_COMPILER_H */
--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -105,9 +105,7 @@ __iter_div_u64_rem(u64 dividend, u32 div
 	u32 ret = 0;
 
 	while (dividend >= divisor) {
-		/* The following asm() prevents the compiler from
-		   optimising this loop into a modulo operation.  */
-		asm("" : "+rm"(dividend));
+		__gcc_dont_optimize_loop(dividend);
 
 		dividend -= divisor;
 		ret++;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -628,12 +628,8 @@ void sched_avg_update(struct rq *rq)
 	s64 period = sched_avg_period();
 
 	while ((s64)(rq->clock - rq->age_stamp) > period) {
-		/*
-		 * Inline assembly required to prevent the compiler
-		 * optimising this loop into a divmod call.
-		 * See __iter_div_u64_rem() for another example of this.
-		 */
-		asm("" : "+rm" (rq->age_stamp));
+		__gcc_dont_optimize_loop(rq->age_stamp);
+
 		rq->age_stamp += period;
 		rq->rt_avg /= 2;
 	}
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -853,12 +853,8 @@ static void update_cfs_load(struct cfs_r
 		update_cfs_rq_load_contribution(cfs_rq, global_update);
 
 	while (cfs_rq->load_period > period) {
-		/*
-		 * Inline assembly required to prevent the compiler
-		 * optimising this loop into a divmod call.
-		 * See __iter_div_u64_rem() for another example of this.
-		 */
-		asm("" : "+rm" (cfs_rq->load_period));
+		__gcc_dont_optimize_loop(cfs_rq->load_period);
+
 		cfs_rq->load_period /= 2;
 		cfs_rq->load_avg /= 2;
 	}
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -349,17 +349,14 @@ EXPORT_SYMBOL(mktime);
 void set_normalized_timespec(struct timespec *ts, time_t sec, s64 nsec)
 {
 	while (nsec >= NSEC_PER_SEC) {
-		/*
-		 * The following asm() prevents the compiler from
-		 * optimising this loop into a modulo operation. See
-		 * also __iter_div_u64_rem() in include/linux/time.h
-		 */
-		asm("" : "+rm"(nsec));
+		__gcc_dont_optimize_loop(nsec);
+
 		nsec -= NSEC_PER_SEC;
 		++sec;
 	}
 	while (nsec < 0) {
-		asm("" : "+rm"(nsec));
+		__gcc_dont_optimize_loop(nsec);
+
 		nsec += NSEC_PER_SEC;
 		--sec;
 	}



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra
  2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra
@ 2012-04-24 16:10 ` Peter Zijlstra
  2012-04-24 19:37   ` Linus Torvalds
  2012-04-25  0:23   ` Stephen Rothwell
  2012-04-24 16:10 ` [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm Peter Zijlstra
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 16:10 UTC (permalink / raw)
  To: linux-kernel, linux-arch
  Cc: Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner, Peter Zijlstra

[-- Attachment #1: math128.patch --]
[-- Type: text/plain, Size: 7794 bytes --]

Grow rudimentary u128 support without relying on gcc/libgcc.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/alpha/include/asm/math128.h      |    4 ++
 arch/arm/include/asm/math128.h        |    4 ++
 arch/avr32/include/asm/math128.h      |    4 ++
 arch/blackfin/include/asm/math128.h   |    4 ++
 arch/c6x/include/asm/math128.h        |    4 ++
 arch/cris/include/asm/math128.h       |    4 ++
 arch/frv/include/asm/math128.h        |    4 ++
 arch/h8300/include/asm/math128.h      |    4 ++
 arch/hexagon/include/asm/math128.h    |    4 ++
 arch/ia64/include/asm/math128.h       |    4 ++
 arch/m32r/include/asm/math128.h       |    4 ++
 arch/m68k/include/asm/math128.h       |    4 ++
 arch/microblaze/include/asm/math128.h |    4 ++
 arch/mips/include/asm/math128.h       |    4 ++
 arch/mn10300/include/asm/math128.h    |    4 ++
 arch/openrisc/include/asm/math128.h   |    4 ++
 arch/parisc/include/asm/math128.h     |    4 ++
 arch/powerpc/include/asm/math128.h    |    4 ++
 arch/s390/include/asm/math128.h       |    4 ++
 arch/score/include/asm/math128.h      |    4 ++
 arch/sh/include/asm/math128.h         |    4 ++
 arch/sparc/include/asm/math128.h      |    4 ++
 arch/tile/include/asm/math128.h       |    4 ++
 arch/um/include/asm/math128.h         |    4 ++
 arch/unicore32/include/asm/math128.h  |    4 ++
 arch/x86/include/asm/math128.h        |    4 ++
 arch/xtensa/include/asm/math128.h     |    4 ++
 include/linux/math128.h               |   47 ++++++++++++++++++++++++++++++++++
 lib/Makefile                          |    2 -
 lib/math128.c                         |   42 ++++++++++++++++++++++++++++++
 30 files changed, 198 insertions(+), 1 deletion(-)

--- /dev/null
+++ b/arch/alpha/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/arm/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/avr32/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/blackfin/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/c6x/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/cris/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/frv/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/h8300/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/hexagon/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/ia64/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/m32r/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/m68k/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/microblaze/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/mips/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/mn10300/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/openrisc/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/parisc/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/powerpc/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/s390/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/score/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/sh/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/sparc/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/tile/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/um/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/unicore32/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/x86/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/arch/xtensa/include/asm/math128.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_MATH128_H
+#define _ASM_MATH128_H
+
+#endif /* _ASM_MATH128_H */
--- /dev/null
+++ b/include/linux/math128.h
@@ -0,0 +1,47 @@
+#ifndef _LINUX_MATH128_H
+#define _LINUX_MATH128_H
+
+#include <linux/types.h>
+
+typedef struct {
+	u64 hi, lo;
+} u128;
+
+#include <asm/math128.h>
+
+#ifndef add_u128
+static inline u128 add_u128(u128 a, u128 b)
+{
+	u128 res;
+
+	res.hi = a.hi + b.hi;
+	res.lo = a.lo + b.lo;
+
+	if (res.lo < a.lo || res.lo < b.lo)
+		res.hi++;
+
+	return res;
+}
+#endif /* add_u128 */
+
+#ifndef mult_u128
+extern int u128 mult_u128(u64 a, u64 b);
+#endif
+
+#ifndef cmp_u128
+static inline int cmp_u128(u128 a, u128 b)
+{
+	if (a.hi > b.hi)
+		return 1;
+	if (a.hi < b.hi)
+		return -1;
+	if (a.lo > b.lo)
+		return 1;
+	if (a.lo < b.lo)
+		return -1;
+
+	return 0;
+}
+#endif /* cmp_u128 */
+
+#endif /* _LINUX_MATH128_H */
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -12,7 +12,7 @@ lib-y := ctype.o string.o vsprintf.o cmd
 	 idr.o int_sqrt.o extable.o prio_tree.o \
 	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o prio_heap.o ratelimit.o show_mem.o \
-	 is_single_threaded.o plist.o decompress.o
+	 is_single_threaded.o plist.o decompress.o math128.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
--- /dev/null
+++ b/lib/math128.c
@@ -0,0 +1,42 @@
+#include <linux/math128.h>
+
+#ifndef mult_u128
+/*
+ * a * b = (ah * 2^32 + al) * (bh * 2^32 + bl) =
+ *   ah*bh * 2^64 + (ah*bl + bh*al) * 2^32 + al*bl
+ */
+u128 mult_u128(u64 a, u64 b)
+{
+	u128 res;
+	u64 ah, al;
+	u64 bh, bl;
+	u128 t1, t2, t3, t4;
+
+	ah = a >> 32;
+	al = a & ((1ULL << 32) - 1);
+
+	bh = b >> 32;
+	bl = b & ((1ULL << 32) - 1);
+
+	t1.lo = 0;
+	t1.hi = ah * bh;
+
+	t2.lo = ah * bl;
+	t2.hi = t2.lo >> 32;
+	t2.lo <<= 32;
+
+	t3.lo = al * bh;
+	t3.hi = t3.lo >> 32;
+	t3.lo <<= 32;
+
+	t4.lo = al * bl;
+	t4.hi = 0;
+
+	res = add_u128(t1, t2);
+	res = add_u128(res, t3);
+	res = add_u128(res, t4);
+
+	return res;
+}
+#endif /* mult_u128 */
+



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm
  2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra
  2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra
  2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra
@ 2012-04-24 16:10 ` Peter Zijlstra
  2012-04-24 16:34   ` H. Peter Anvin
  2012-04-24 17:22 ` [RFC][PATCH 0/3] gcc work-around and math128 H. Peter Anvin
  2012-04-24 21:15 ` Andy Lutomirski
  4 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 16:10 UTC (permalink / raw)
  To: linux-kernel, linux-arch
  Cc: Linus Torvalds, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Peter Zijlstra

[-- Attachment #1: math128-x86_64.patch --]
[-- Type: text/plain, Size: 1020 bytes --]



Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/include/asm/math128.h |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

--- a/arch/x86/include/asm/math128.h
+++ b/arch/x86/include/asm/math128.h
@@ -1,4 +1,33 @@
 #ifndef _ASM_MATH128_H
 #define _ASM_MATH128_H
 
+#ifdef CONFIG_X86_64
+
+static inline mult_u128(u64 a, u64 b)
+{
+	u128 res;
+
+	asm("mulq %2"
+		: "=a" (res.lo), "=d" (res.hi)
+		:  "rm" (b), "0" (a));
+
+	return res;
+}
+#define mult_u128 mult_u128
+
+static inline add_u128(u128 a, u128 b)
+{
+	u128 res;
+
+	asm("addq %2,%0;\n"
+	    "adcq %3,%1;\n"
+		: "=rm" (res.lo), "=rm" (res.hi)
+		: "r" (b.lo), "r" (b.hi), "0" (a.lo), "1" (a.hi));
+
+	return res;
+}
+#define add_u128 add_u128
+
+#endif /* CONFIG_X86_64 */
+
 #endif /* _ASM_MATH128_H */



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm
  2012-04-24 16:10 ` [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm Peter Zijlstra
@ 2012-04-24 16:34   ` H. Peter Anvin
  2012-04-24 16:36     ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2012-04-24 16:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On 04/24/2012 09:10 AM, Peter Zijlstra wrote:

Are you sure that gcc won't simply generate this code already if given
"unsigned __int128"?

	-hpa

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm
  2012-04-24 16:34   ` H. Peter Anvin
@ 2012-04-24 16:36     ` Peter Zijlstra
  2012-04-24 17:17       ` H. Peter Anvin
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 16:36 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On Tue, 2012-04-24 at 09:34 -0700, H. Peter Anvin wrote:
> On 04/24/2012 09:10 AM, Peter Zijlstra wrote:
> 
> Are you sure that gcc won't simply generate this code already if given
> "unsigned __int128"?

It might, but I've been told some GCC versions had issues, also lots of
archs fall back to libgcc functions.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm
  2012-04-24 16:36     ` Peter Zijlstra
@ 2012-04-24 17:17       ` H. Peter Anvin
  2012-04-24 17:19         ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2012-04-24 17:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On 04/24/2012 09:36 AM, Peter Zijlstra wrote:
> On Tue, 2012-04-24 at 09:34 -0700, H. Peter Anvin wrote:
>> On 04/24/2012 09:10 AM, Peter Zijlstra wrote:
>>
>> Are you sure that gcc won't simply generate this code already if given
>> "unsigned __int128"?
> 
> It might, but I've been told some GCC versions had issues, also lots of
> archs fall back to libgcc functions.
> 

Yes, but this is specific to x86-64.  If gcc can generate the code it
can also schedule the code, which it can't for assembly...

	-hpa



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm
  2012-04-24 17:17       ` H. Peter Anvin
@ 2012-04-24 17:19         ` Peter Zijlstra
  2012-04-24 17:20           ` H. Peter Anvin
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 17:19 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On Tue, 2012-04-24 at 10:17 -0700, H. Peter Anvin wrote:
> On 04/24/2012 09:36 AM, Peter Zijlstra wrote:
> > On Tue, 2012-04-24 at 09:34 -0700, H. Peter Anvin wrote:
> >> On 04/24/2012 09:10 AM, Peter Zijlstra wrote:
> >>
> >> Are you sure that gcc won't simply generate this code already if given
> >> "unsigned __int128"?
> > 
> > It might, but I've been told some GCC versions had issues, also lots of
> > archs fall back to libgcc functions.
> > 
> 
> Yes, but this is specific to x86-64.  If gcc can generate the code it
> can also schedule the code, which it can't for assembly...

Ok, but are you sure unsigned __int128 works for all gcc versions we
support?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm
  2012-04-24 17:19         ` Peter Zijlstra
@ 2012-04-24 17:20           ` H. Peter Anvin
  2012-04-24 22:00             ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2012-04-24 17:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On 04/24/2012 10:19 AM, Peter Zijlstra wrote:
>>
>> Yes, but this is specific to x86-64.  If gcc can generate the code it
>> can also schedule the code, which it can't for assembly...
> 
> Ok, but are you sure unsigned __int128 works for all gcc versions we
> support?
> 

I have been told __int128 works for all versions of gcc which support
x86-64, but I haven't verified it myself, no.

	-hpa

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] gcc work-around and math128
  2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra
                   ` (2 preceding siblings ...)
  2012-04-24 16:10 ` [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm Peter Zijlstra
@ 2012-04-24 17:22 ` H. Peter Anvin
  2012-04-24 17:27   ` H. Peter Anvin
  2012-04-24 21:15 ` Andy Lutomirski
  4 siblings, 1 reply; 34+ messages in thread
From: H. Peter Anvin @ 2012-04-24 17:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli

On 04/24/2012 09:10 AM, Peter Zijlstra wrote:
> 
> So far its just mult, add and cmp for u128, no divisions quite yet. I realize
> that on 32bit archs mult_u128 will result in like 8 multiplication
> instructions that's still nowhere near the most expensive thing ever.
> 

Please add shifts, at the very least.

	-hpa


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] gcc work-around and math128
  2012-04-24 17:22 ` [RFC][PATCH 0/3] gcc work-around and math128 H. Peter Anvin
@ 2012-04-24 17:27   ` H. Peter Anvin
  0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2012-04-24 17:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli

On 04/24/2012 10:22 AM, H. Peter Anvin wrote:
> On 04/24/2012 09:10 AM, Peter Zijlstra wrote:
>>
>> So far its just mult, add and cmp for u128, no divisions quite yet. I realize
>> that on 32bit archs mult_u128 will result in like 8 multiplication
>> instructions that's still nowhere near the most expensive thing ever.
>>
> 
> Please add shifts, at the very least.
> 

Oh, and a constant constructor.

	-hpa


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra
@ 2012-04-24 19:37   ` Linus Torvalds
  2012-04-24 19:43     ` Peter Zijlstra
                       ` (2 more replies)
  2012-04-25  0:23   ` Stephen Rothwell
  1 sibling, 3 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-04-24 19:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner

On Tue, Apr 24, 2012 at 9:10 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> Grow rudimentary u128 support without relying on gcc/libgcc.
>
> +#ifndef add_u128
> +static inline u128 add_u128(u128 a, u128 b)
> +{
> +       u128 res;
> +
> +       res.hi = a.hi + b.hi;
> +       res.lo = a.lo + b.lo;
> +
> +       if (res.lo < a.lo || res.lo < b.lo)
> +               res.hi++;

This is wrong. Or at least stupid.

Just do one of the comparisons, not both. If overflow occurs, the
result will be smaller than *either* of the added numbers, so
comparing both is just silly and confused.

So just pick one.

Also, it might be worth looking at code generation, to see if it's
better to just do

   a.hi += b.hi;
   a.low += b.low;
   if (a.low < b.low)
      a.hi++;
   return a;

because that might make it clear that there are fewer actual values
live at any particular time. But gcc may not care. Try it.

Also, for the multiply, please make sure gcc knows to do a "32x32->64"
multiplication, rather than thinking it needs to do full 64x64
multiplies..

I'm not sure gcc understands that as you wrote it. You are probably
better off actually using 32-bit values, and then an explicit cast, ie

  u32 a32_0 = .. low 32 bits of a ..
  u32 b32_0 = .. low 32 bits of b ..
  u64 res64_0 = (u64) a32_0 * (u64) b32_0;

but if gcc understands it from the shifts and masks, I guess it doesn't matter.

        Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds
  2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra
@ 2012-04-24 19:39   ` Linus Torvalds
  2012-04-25  0:14   ` Stephen Rothwell
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-04-24 19:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner

On Tue, Apr 24, 2012 at 9:10 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> +#define __gcc_dont_optimize_loop(var) asm("" "+rm" (var))

This seems to be missing a colon.

               Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-24 19:37   ` Linus Torvalds
@ 2012-04-24 19:43     ` Peter Zijlstra
  2012-04-24 20:12     ` Måns Rullgård
  2012-04-24 21:54     ` Peter Zijlstra
  2 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 19:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner

On Tue, 2012-04-24 at 12:37 -0700, Linus Torvalds wrote:
> Or at least stupid.
> 
I'll take stupid.. this bit-fiddling isn't something I've done lots of.

I'll go stare at gcc output now.. ;-)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-24 19:37   ` Linus Torvalds
  2012-04-24 19:43     ` Peter Zijlstra
@ 2012-04-24 20:12     ` Måns Rullgård
  2012-04-24 21:54     ` Peter Zijlstra
  2 siblings, 0 replies; 34+ messages in thread
From: Måns Rullgård @ 2012-04-24 20:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, linux-kernel, linux-arch, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Tue, Apr 24, 2012 at 9:10 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> Grow rudimentary u128 support without relying on gcc/libgcc.
>>
>> +#ifndef add_u128
>> +static inline u128 add_u128(u128 a, u128 b)
>> +{
>> +       u128 res;
>> +
>> +       res.hi = a.hi + b.hi;
>> +       res.lo = a.lo + b.lo;
>> +
>> +       if (res.lo < a.lo || res.lo < b.lo)
>> +               res.hi++;
>
> This is wrong. Or at least stupid.
>
> Just do one of the comparisons, not both. If overflow occurs, the
> result will be smaller than *either* of the added numbers, so
> comparing both is just silly and confused.
>
> So just pick one.
>
> Also, it might be worth looking at code generation, to see if it's
> better to just do
>
>    a.hi += b.hi;
>    a.low += b.low;
>    if (a.low < b.low)
>       a.hi++;
>    return a;

I have no idea if it makes a difference, but that if statement can be
written as a.hi += a.low < b.low.  Just an observation.

> because that might make it clear that there are fewer actual values
> live at any particular time. But gcc may not care. Try it.
>
> Also, for the multiply, please make sure gcc knows to do a "32x32->64"
> multiplication, rather than thinking it needs to do full 64x64
> multiplies..

On ARM it does the right thing at least since 4.3, which is the oldest
ARM compiler I have at hand.

-- 
Måns Rullgård
mans@mansr.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] gcc work-around and math128
  2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra
                   ` (3 preceding siblings ...)
  2012-04-24 17:22 ` [RFC][PATCH 0/3] gcc work-around and math128 H. Peter Anvin
@ 2012-04-24 21:15 ` Andy Lutomirski
  2012-04-24 21:18   ` Linus Torvalds
  2012-04-24 21:32   ` Peter Zijlstra
  4 siblings, 2 replies; 34+ messages in thread
From: Andy Lutomirski @ 2012-04-24 21:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli

On 04/24/2012 09:10 AM, Peter Zijlstra wrote:
> Hi all,
> 
> The SCHED_DEADLINE review resulted in the following three patches;
> 
> The first is a cleanup of various copies of the same GCC loop optimization
> work-around. I don't think this patch is too controversial, at worst I've
> picked a wrong name, but I wanted to get it out there in case people
> know more sites.
> 
> The second two implement a few u128 operations so we can do 128bit math.. I
> know a few people will die a little inside, but having nanosecond granularity
> time accounting leads to very big numbers very quickly and when you need to
> multiply them 64bit really isn't that much.

I played with some of this stuff awhile ago, and for timekeeping, it
seemed like a 64x32->96 bit multiply followed by a right shift was
enough, and that operation is a lot faster on 32-bit architectures than
a full 64x64->128 multiply.  Something like:

uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift)
{
  return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift );
}

or (untested, but compilable 32-bit gcc)

uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift)
{
  uint64_t part1 = ((a & 0xFFFFFFFFULL) * mult) >> shift;
  uint64_t part2 = ((a >> 32) * mult) << (32 - shift);
  return part1 + part2;
}

--Andy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] gcc work-around and math128
  2012-04-24 21:15 ` Andy Lutomirski
@ 2012-04-24 21:18   ` Linus Torvalds
  2012-04-24 21:32   ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-04-24 21:18 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, linux-kernel, linux-arch, Andrew Morton, Juri Lelli

On Tue, Apr 24, 2012 at 2:15 PM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> I played with some of this stuff awhile ago, and for timekeeping, it
> seemed like a 64x32->96 bit multiply followed by a right shift was
> enough, and that operation is a lot faster on 32-bit architectures than
> a full 64x64->128 multiply.

Ack. That may sound like odd numbers, but 64x32->96 sounds sane. And I
think it avoids a multiply even on 64-bit, no?

              Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] gcc work-around and math128
  2012-04-24 21:15 ` Andy Lutomirski
  2012-04-24 21:18   ` Linus Torvalds
@ 2012-04-24 21:32   ` Peter Zijlstra
  2012-04-24 21:35     ` Andy Lutomirski
  1 sibling, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 21:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli

On Tue, 2012-04-24 at 14:15 -0700, Andy Lutomirski wrote:
> > The second two implement a few u128 operations so we can do 128bit math.. I
> > know a few people will die a little inside, but having nanosecond granularity
> > time accounting leads to very big numbers very quickly and when you need to
> > multiply them 64bit really isn't that much.
> 
> I played with some of this stuff awhile ago, and for timekeeping, it
> seemed like a 64x32->96 bit multiply followed by a right shift was
> enough, and that operation is a lot faster on 32-bit architectures than
> a full 64x64->128 multiply. 

The SCHED_DEADLINE use case is not that, it multiplies two time
intervals. Basically it needs to evaluate if a task activation still
fits in the old period or if it needs to shift the deadline and start a
new period.

It needs to do: runtime / (deadline - t) < budget / period
which transforms into: (deadline - t) * period < budget * runtime

hence the 64x64->128 mult and 128 compare.

> Something like:
> 
> uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift)
> {
>   return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift );
> }

That looks a lot like what we grew mult_frac() for, it does:

/*              
 * Multiplies an integer by a fraction, while avoiding unnecessary
 * overflow or loss of precision.
 */
#define mult_frac(x, numer, denom)(                     \
{                                                       \
        typeof(x) quot = (x) / (denom);                 \
        typeof(x) rem  = (x) % (denom);                 \
        (quot * (numer)) + ((rem * (numer)) / (denom)); \
}                                                       \
)


and is used in __cycles_2_ns() and friends.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] gcc work-around and math128
  2012-04-24 21:32   ` Peter Zijlstra
@ 2012-04-24 21:35     ` Andy Lutomirski
  2012-04-24 21:51       ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Lutomirski @ 2012-04-24 21:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli

On Tue, Apr 24, 2012 at 2:32 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Tue, 2012-04-24 at 14:15 -0700, Andy Lutomirski wrote:
>> > The second two implement a few u128 operations so we can do 128bit math.. I
>> > know a few people will die a little inside, but having nanosecond granularity
>> > time accounting leads to very big numbers very quickly and when you need to
>> > multiply them 64bit really isn't that much.
>>
>> I played with some of this stuff awhile ago, and for timekeeping, it
>> seemed like a 64x32->96 bit multiply followed by a right shift was
>> enough, and that operation is a lot faster on 32-bit architectures than
>> a full 64x64->128 multiply.
>
> The SCHED_DEADLINE use case is not that, it multiplies two time
> intervals. Basically it needs to evaluate if a task activation still
> fits in the old period or if it needs to shift the deadline and start a
> new period.
>
> It needs to do: runtime / (deadline - t) < budget / period
> which transforms into: (deadline - t) * period < budget * runtime
>
> hence the 64x64->128 mult and 128 compare.

Fair enough.

>
>> Something like:
>>
>> uint64_t mul_64_32_shift(uint64_t a, uint32_t mult, uint32_t shift)
>> {
>>   return (uint64_t)( ((__uint128_t)a * (__uint128_t)mult) >> shift );
>> }
>
> That looks a lot like what we grew mult_frac() for, it does:
>
> /*
>  * Multiplies an integer by a fraction, while avoiding unnecessary
>  * overflow or loss of precision.
>  */
> #define mult_frac(x, numer, denom)(                     \
> {                                                       \
>        typeof(x) quot = (x) / (denom);                 \
>        typeof(x) rem  = (x) % (denom);                 \
>        (quot * (numer)) + ((rem * (numer)) / (denom)); \
> }                                                       \
> )
>
>
> and is used in __cycles_2_ns() and friends.

Yeesh.  That looks way slower, and IIRC __cycles_2_ns overflows every
few seconds on modern machines.

gcc 4.6 generates this code:

mul_64_32_shift:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    %edx, %ecx
        movl    %esi, %eax
        mulq    %rdi
        movq    %rdx, %rsi
        shrq    %cl, %rsi
        shrdq   %cl, %rdx, %rax
        testb   $64, %cl
        cmovneq %rsi, %rax
        popq    %rbp
        ret

which is a bit dumb if you can make assumptions about the shift.  See
http://gcc.gnu.org/PR46514.  Some use cases might be able to guarantee
that the shift is less than 32 bits, in which case hand-written
assembly would be a few cycles faster.

--Andy

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 0/3] gcc work-around and math128
  2012-04-24 21:35     ` Andy Lutomirski
@ 2012-04-24 21:51       ` Peter Zijlstra
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 21:51 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton, Juri Lelli

On Tue, 2012-04-24 at 14:35 -0700, Andy Lutomirski wrote:
> Yeesh.  That looks way slower.

Indeed, pulled it through gcc and saw what was generated with
mult_frac :/

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-24 19:37   ` Linus Torvalds
  2012-04-24 19:43     ` Peter Zijlstra
  2012-04-24 20:12     ` Måns Rullgård
@ 2012-04-24 21:54     ` Peter Zijlstra
  2012-04-25  0:09       ` H. Peter Anvin
  2012-04-25  1:46       ` Linus Torvalds
  2 siblings, 2 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 21:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner

On Tue, 2012-04-24 at 12:37 -0700, Linus Torvalds wrote:
> Also, it might be worth looking at code generation, to see if it's
> better to just do
> 
>    a.hi += b.hi;
>    a.low += b.low;
>    if (a.low < b.low)
>       a.hi++;
>    return a;
> 
> because that might make it clear that there are fewer actual values
> live at any particular time. But gcc may not care. Try it.

It does indeed generate tons better code. FWIW, Mans' suggestion of: 
  a.hi += a.lo < b.lo;
horribly confuses gcc.

> Also, for the multiply, please make sure gcc knows to do a "32x32->64"
> multiplication, rather than thinking it needs to do full 64x64
> multiplies..
> 
> I'm not sure gcc understands that as you wrote it.

It does indeed grok it (as Mans also confirmed for ARM), however:

>  You are probably
> better off actually using 32-bit values, and then an explicit cast, ie
> 
>   u32 a32_0 = .. low 32 bits of a ..
>   u32 b32_0 = .. low 32 bits of b ..
>   u64 res64_0 = (u64) a32_0 * (u64) b32_0;
> 
> but if gcc understands it from the shifts and masks, I guess it doesn't matter.

that does generate slightly better code in that it avoids some masks on
64bit:

@@ -7,12 +7,11 @@
 .LFB38:
        .cfi_startproc
        movq    %rdi, %r8
-       movq    %rdi, %rdx
        movq    %rsi, %rcx
+       mov     %edi, %edx
        shrq    $32, %r8
-       andl    $4294967295, %edx
        shrq    $32, %rcx
-       andl    $4294967295, %esi
+       mov     %esi, %esi
        movq    %rcx, %rax
        imulq   %rdx, %rcx
        imulq   %rsi, %rdx




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm
  2012-04-24 17:20           ` H. Peter Anvin
@ 2012-04-24 22:00             ` Peter Zijlstra
  2012-04-24 22:03               ` H. Peter Anvin
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-24 22:00 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On Tue, 2012-04-24 at 10:20 -0700, H. Peter Anvin wrote:
> I have been told __int128 works for all versions of gcc which support
> x86-64, but I haven't verified it myself, no. 

The gcc version I have seems to grok it:

u128 mult_u64_u128(u64 a, u64 b)                                                             
{                                                                                            
        unsigned __int128 t = a;                                                             
        t *= b;                                                                              
        return U128_INIT(t >> 64, t);                                                        
}
#define mult_u64_u128 mult_u64_u128

generates:

        .p2align 4,,15
        .globl  mult_u64_u128
        .type   mult_u64_u128, @function
mult_u64_u128:
.LFB38:
        .cfi_startproc
        movq    %rsi, %rax
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        mulq    %rdi
        popq    %rbp
        .cfi_def_cfa_offset 8
        movq    %rax, %rdi
        movq    %rdx, %rax
        movq    %rdi, %rdx
        ret
        .cfi_endproc
.LFE38:
        .size   mult_u64_u128, .-mult_u64_u128

I haven't been able to make add_u128 generate sane code using __int128,
the conversion between my struct and the __int128 layout makes a horrid
mess.



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm
  2012-04-24 22:00             ` Peter Zijlstra
@ 2012-04-24 22:03               ` H. Peter Anvin
  0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2012-04-24 22:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On 04/24/2012 03:00 PM, Peter Zijlstra wrote:
> 
> I haven't been able to make add_u128 generate sane code using __int128,
> the conversion between my struct and the __int128 layout makes a horrid
> mess.
> 

Use a union, or define your type simply as __int128 if you're on a
suitable platform?

	-hpa


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-24 21:54     ` Peter Zijlstra
@ 2012-04-25  0:09       ` H. Peter Anvin
  2012-04-25  1:46       ` Linus Torvalds
  1 sibling, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2012-04-25  0:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, linux-kernel, linux-arch, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On 04/24/2012 02:54 PM, Peter Zijlstra wrote:
> -       andl    $4294967295, %edx
>         shrq    $32, %rcx
> -       andl    $4294967295, %esi
> +       mov     %esi, %esi

If you can *ever* get gcc to generate those andl instructions on x86,
then please file a gcc bug report: the constant is 0xffffffff and those
are plain zero-extension instructions which is much better done with mov.

	-hpa


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds
  2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra
  2012-04-24 19:39   ` Linus Torvalds
@ 2012-04-25  0:14   ` Stephen Rothwell
  1 sibling, 0 replies; 34+ messages in thread
From: Stephen Rothwell @ 2012-04-25  0:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

Hi Peter,

On Tue, 24 Apr 2012 18:10:40 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> We've grown various copies of a particular gcc work-around,
> consolidate them into one and add a larger comment.
> 
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  include/linux/compiler.h |   12 ++++++++++++
>  include/linux/math64.h   |    4 +---
>  kernel/sched/core.c      |    8 ++------
>  kernel/sched/fair.c      |    8 ++------
>  kernel/time.c            |   11 ++++-------

You should, of course, include linux/compiler.h into those last 4 files ...

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra
  2012-04-24 19:37   ` Linus Torvalds
@ 2012-04-25  0:23   ` Stephen Rothwell
  2012-04-25  8:11     ` Peter Zijlstra
  1 sibling, 1 reply; 34+ messages in thread
From: Stephen Rothwell @ 2012-04-25  0:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 2319 bytes --]

Hi Peter,

On Tue, 24 Apr 2012 18:10:41 +0200 Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> Grow rudimentary u128 support without relying on gcc/libgcc.
> 
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/alpha/include/asm/math128.h      |    4 ++
>  arch/arm/include/asm/math128.h        |    4 ++
>  arch/avr32/include/asm/math128.h      |    4 ++
>  arch/blackfin/include/asm/math128.h   |    4 ++
>  arch/c6x/include/asm/math128.h        |    4 ++
>  arch/cris/include/asm/math128.h       |    4 ++
>  arch/frv/include/asm/math128.h        |    4 ++
>  arch/h8300/include/asm/math128.h      |    4 ++
>  arch/hexagon/include/asm/math128.h    |    4 ++
>  arch/ia64/include/asm/math128.h       |    4 ++
>  arch/m32r/include/asm/math128.h       |    4 ++
>  arch/m68k/include/asm/math128.h       |    4 ++
>  arch/microblaze/include/asm/math128.h |    4 ++
>  arch/mips/include/asm/math128.h       |    4 ++
>  arch/mn10300/include/asm/math128.h    |    4 ++
>  arch/openrisc/include/asm/math128.h   |    4 ++
>  arch/parisc/include/asm/math128.h     |    4 ++
>  arch/powerpc/include/asm/math128.h    |    4 ++
>  arch/s390/include/asm/math128.h       |    4 ++
>  arch/score/include/asm/math128.h      |    4 ++
>  arch/sh/include/asm/math128.h         |    4 ++
>  arch/sparc/include/asm/math128.h      |    4 ++
>  arch/tile/include/asm/math128.h       |    4 ++
>  arch/um/include/asm/math128.h         |    4 ++
>  arch/unicore32/include/asm/math128.h  |    4 ++
>  arch/x86/include/asm/math128.h        |    4 ++
>  arch/xtensa/include/asm/math128.h     |    4 ++
>  include/linux/math128.h               |   47 ++++++++++++++++++++++++++++++++++

Isn't this what asm-generic is for?  You put the generic version in
asm-generic/math128.h and then add "generic-y += math128.h" to
arch/*/include/asm/Kbuild for each arch that wants the generic version.
Then just include asm/math128.h in linux/math128.h.  Any arch that wants
to optimise these can then just provide its own math128.h.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-24 21:54     ` Peter Zijlstra
  2012-04-25  0:09       ` H. Peter Anvin
@ 2012-04-25  1:46       ` Linus Torvalds
  2012-04-25  8:35         ` Peter Zijlstra
  2012-04-25 10:13         ` Peter Zijlstra
  1 sibling, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2012-04-25  1:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner

On Tue, Apr 24, 2012 at 2:54 PM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>
> that does generate slightly better code in that it avoids some masks on
> 64bit:
>
> @@ -7,12 +7,11 @@
>  .LFB38:
>        .cfi_startproc
>        movq    %rdi, %r8
> -       movq    %rdi, %rdx
>        movq    %rsi, %rcx
> +       mov     %edi, %edx
>        shrq    $32, %r8
> -       andl    $4294967295, %edx
>        shrq    $32, %rcx
> -       andl    $4294967295, %esi
> +       mov     %esi, %esi

Oh christ.

What insane version of gcc is that? Can you please make a gcc bug-report?

Because a compiler that generates an instruction sequence like

    movq %rdi,%rsi
    andl $4294967295, %esi

is just so fricking stupid that it's outright buggy. That's just
crazy. It's demented. It's an "and" with all bits set.

But yeah, I do think that in general using a cast to 32-bit instead of
a mask to 32-bit is easier for the compiler. Although that still is a
particularly stupid code sequence to use.

                     Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-25  0:23   ` Stephen Rothwell
@ 2012-04-25  8:11     ` Peter Zijlstra
  2012-04-25 11:23       ` Geert Uytterhoeven
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-25  8:11 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: linux-kernel, linux-arch, Linus Torvalds, Andrew Morton,
	Juri Lelli, Ingo Molnar, Thomas Gleixner

On Wed, 2012-04-25 at 10:23 +1000, Stephen Rothwell wrote:
> Isn't this what asm-generic is for?  You put the generic version in
> asm-generic/math128.h and then add "generic-y += math128.h" to
> arch/*/include/asm/Kbuild for each arch that wants the generic version.
> Then just include asm/math128.h in linux/math128.h.  Any arch that wants
> to optimise these can then just provide its own math128.h. 

Ah, right. I didn't know that Kbuild trick. /me does:

  ls arch/*/include/asm/Kbuild | while read file; do quilt add $file;
echo "generic-y += math128.h" >> $file; done

Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-25  1:46       ` Linus Torvalds
@ 2012-04-25  8:35         ` Peter Zijlstra
  2012-04-25 10:13         ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-25  8:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner

On Tue, 2012-04-24 at 18:46 -0700, Linus Torvalds wrote:
> What insane version of gcc is that? Can you please make a gcc bug-report?

gcc (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1


I'll go and build me a gcc.git 4.6 and 4.7 branch compiler to see if its
still there.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-25  1:46       ` Linus Torvalds
  2012-04-25  8:35         ` Peter Zijlstra
@ 2012-04-25 10:13         ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-25 10:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-arch, Andrew Morton, Juri Lelli, Ingo Molnar,
	Thomas Gleixner

On Tue, 2012-04-24 at 18:46 -0700, Linus Torvalds wrote:
> Can you please make a gcc bug-report?
> 
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53110

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-25  8:11     ` Peter Zijlstra
@ 2012-04-25 11:23       ` Geert Uytterhoeven
  2012-04-25 11:58         ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Geert Uytterhoeven @ 2012-04-25 11:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephen Rothwell, linux-kernel, linux-arch, Linus Torvalds,
	Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner

On Wed, Apr 25, 2012 at 10:11, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>  ls arch/*/include/asm/Kbuild | while read file; do quilt add $file;
> echo "generic-y += math128.h" >> $file; done

That may fail if your "ls" is an alias.
Better: "for file in arch/*/include/asm/Kbuild;do ..; done".

Still using quilt? I didn't regret ditching it in favor of git (yummy,
git rebase ;-).

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-25 11:23       ` Geert Uytterhoeven
@ 2012-04-25 11:58         ` Peter Zijlstra
  2012-04-25 14:35           ` Geert Uytterhoeven
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-25 11:58 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Stephen Rothwell, linux-kernel, linux-arch, Linus Torvalds,
	Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner

On Wed, 2012-04-25 at 13:23 +0200, Geert Uytterhoeven wrote:
> On Wed, Apr 25, 2012 at 10:11, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >  ls arch/*/include/asm/Kbuild | while read file; do quilt add $file;
> > echo "generic-y += math128.h" >> $file; done
> 
> That may fail if your "ls" is an alias.
> Better: "for file in arch/*/include/asm/Kbuild;do ..; done".

Agreed, still ls worked and the patch is send..

> Still using quilt? I didn't regret ditching it in favor of git (yummy,
> git rebase ;-).

Yeah, I can't seem to deal with git, I edit my patches by hand way too
often, also quilt is a _LOT_ faster pushing/popping patches than git is
(and yes I use an SSD).

Not using git also means I don't have to deal with the trainwreck that
is git-merge,.. /me runs like hell.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-25 11:58         ` Peter Zijlstra
@ 2012-04-25 14:35           ` Geert Uytterhoeven
  2012-04-25 15:09             ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Geert Uytterhoeven @ 2012-04-25 14:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Stephen Rothwell, linux-kernel, linux-arch, Linus Torvalds,
	Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner

On Wed, Apr 25, 2012 at 13:58, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
>> Still using quilt? I didn't regret ditching it in favor of git (yummy,
>> git rebase ;-).
>
> Yeah, I can't seem to deal with git, I edit my patches by hand way too
> often, also quilt is a _LOT_ faster pushing/popping patches than git is
> (and yes I use an SSD).
>
> Not using git also means I don't have to deal with the trainwreck that
> is git-merge,.. /me runs like hell.

But you have to deal with the quilt-oh-no-the-patch-no-longer-applies
trainwreck...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128
  2012-04-25 14:35           ` Geert Uytterhoeven
@ 2012-04-25 15:09             ` Peter Zijlstra
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2012-04-25 15:09 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Stephen Rothwell, linux-kernel, linux-arch, Linus Torvalds,
	Andrew Morton, Juri Lelli, Ingo Molnar, Thomas Gleixner

On Wed, 2012-04-25 at 16:35 +0200, Geert Uytterhoeven wrote:
> On Wed, Apr 25, 2012 at 13:58, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> >> Still using quilt? I didn't regret ditching it in favor of git (yummy,
> >> git rebase ;-).
> >
> > Yeah, I can't seem to deal with git, I edit my patches by hand way too
> > often, also quilt is a _LOT_ faster pushing/popping patches than git is
> > (and yes I use an SSD).
> >
> > Not using git also means I don't have to deal with the trainwreck that
> > is git-merge,.. /me runs like hell.
> 
> But you have to deal with the quilt-oh-no-the-patch-no-longer-applies
> trainwreck...

Yeah, but I find it much easier to deal with .rej files than with the
stuff left behind by git.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2012-04-25 15:10 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-24 16:10 [RFC][PATCH 0/3] gcc work-around and math128 Peter Zijlstra
2012-04-24 16:10 ` [RFC][PATCH 1/3] kernel,sched,time: Clean up gcc work-arounds Peter Zijlstra
2012-04-24 19:39   ` Linus Torvalds
2012-04-25  0:14   ` Stephen Rothwell
2012-04-24 16:10 ` [RFC][PATCH 2/3] math128: Introduce {mult,add,cmp}_u128 Peter Zijlstra
2012-04-24 19:37   ` Linus Torvalds
2012-04-24 19:43     ` Peter Zijlstra
2012-04-24 20:12     ` Måns Rullgård
2012-04-24 21:54     ` Peter Zijlstra
2012-04-25  0:09       ` H. Peter Anvin
2012-04-25  1:46       ` Linus Torvalds
2012-04-25  8:35         ` Peter Zijlstra
2012-04-25 10:13         ` Peter Zijlstra
2012-04-25  0:23   ` Stephen Rothwell
2012-04-25  8:11     ` Peter Zijlstra
2012-04-25 11:23       ` Geert Uytterhoeven
2012-04-25 11:58         ` Peter Zijlstra
2012-04-25 14:35           ` Geert Uytterhoeven
2012-04-25 15:09             ` Peter Zijlstra
2012-04-24 16:10 ` [RFC][PATCH 3/3] math128, x86_64: Implement {mult,add}_u128 in 64bit asm Peter Zijlstra
2012-04-24 16:34   ` H. Peter Anvin
2012-04-24 16:36     ` Peter Zijlstra
2012-04-24 17:17       ` H. Peter Anvin
2012-04-24 17:19         ` Peter Zijlstra
2012-04-24 17:20           ` H. Peter Anvin
2012-04-24 22:00             ` Peter Zijlstra
2012-04-24 22:03               ` H. Peter Anvin
2012-04-24 17:22 ` [RFC][PATCH 0/3] gcc work-around and math128 H. Peter Anvin
2012-04-24 17:27   ` H. Peter Anvin
2012-04-24 21:15 ` Andy Lutomirski
2012-04-24 21:18   ` Linus Torvalds
2012-04-24 21:32   ` Peter Zijlstra
2012-04-24 21:35     ` Andy Lutomirski
2012-04-24 21:51       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.