All of lore.kernel.org
 help / color / mirror / Atom feed
* CONFIG_ARCH_SUPPORTS_INT128: Why not mips, s390, powerpc, and alpha?
@ 2019-03-29 13:07 George Spelvin
  2019-03-29 20:00   ` Michael Cree
                   ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: George Spelvin @ 2019-03-29 13:07 UTC (permalink / raw)
  To: linux-alpha, linux-mips, linux-s390, linuxppc-dev; +Cc: lkml

(Cross-posted in case there are generic issues; please trim if
discussion wanders into single-architecture details.)

I was working on some scaling code that can benefit from 64x64->128-bit
multiplies.  GCC supports an __int128 type on processors with hardware
support (including z/Arch and MIPS64), but the support was broken on
early compilers, so it's gated behind CONFIG_ARCH_SUPPORTS_INT128.

Currently, of the ten 64-bit architectures Linux supports, that's
only enabled on x86, ARM, and RISC-V.

SPARC and HP-PA don't have support.

But that leaves Alpha, Mips, PowerPC, and S/390x.

Current mips64, powerpc64, and s390x gcc seems to generate sensible code
for mul_u64_u64_shr() in <linux/math64.h> if I cross-compile them.

I don't have easy access to an Alpha cross-compiler to test, but
as it has UMULH, I suspect it would work, too.

Is there a reason it hasn't been enabled on these platforms?

There might be a MIPS64r6 issue, since r6 changed from DMULTU
writing the lo and hi registers to DMULU/DMUHU, and gcc 8.3, at
least, doesn't know how to generate inline code for the latter.

(Note that users *also* check __INT128__, which is defined if GCC
claims to support __int128, so you don't have to worry about 32-bit
compiles or ancient compilers.  It only has to be conditional on
*broken* support.)


FWIW, the code I'm working on has this inner loop:
(https://arxiv.org/abs/1805.10941 for details)

u64 get_random_u64(void);
u64 get_random_max64(u64 range, u64 lim)
{
	unsigned __int128 prod;
	do {
		prod = (unsigned __int128)get_random_u64() * range;
	} while (unlikely((u64)prod < lim));
	return prod >> 64;
}

Which turns into these inner loops:
MIPS:
.L7:
	jal	get_random_u64
	nop
	dmultu $2,$17
	mflo	$3
	sltu	$4,$3,$16
	bne	$4,$0,.L7
	mfhi	$2

PowerPC:
.L9:
	bl get_random_u64
	nop
	mulld 9,3,31
	mulhdu 3,3,31
	cmpld 7,30,9
	bgt 7,.L9

s/390:
.L13:
	brasl	%r14,get_random_u64@PLT
	lgr	%r5,%r2
	mlgr	%r4,%r10
	lgr	%r2,%r4
	clgr	%r11,%r5
	jh	.L13

I like that the MIPS code leaves the high half of the product in
the hi register until it tests the low half; I wish PowerPC would
similarly move the mulhdu *after* the loop, like the following
hypothetical MIPS R6 code:

.L7:
	balc	get_random_u64
	dmulu	$3, $2, $17
	sltu	$3, $3, $16
	bnezc	$3, .L7
	dmuhu	$2, $2, $17

Or this handwritten Alpha code:
1:
	bsr	$26, get_random_u64
	mulq	$0, $9, $1	# $9 is range
	cmpult	$1, $10, $1	# $10 is lim
	bne	$1, 1b
	umulh	$0, $9, $0

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-03-31  9:23 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-29 13:07 CONFIG_ARCH_SUPPORTS_INT128: Why not mips, s390, powerpc, and alpha? George Spelvin
2019-03-29 20:00 ` Michael Cree
2019-03-29 20:00   ` Michael Cree
2019-03-30 23:14   ` Segher Boessenkool
2019-03-29 20:25 ` Segher Boessenkool
2019-03-29 20:25   ` Segher Boessenkool
2019-03-30 11:28   ` George Spelvin
2019-03-30 11:28     ` George Spelvin
2019-03-30 23:52     ` Segher Boessenkool
2019-03-30 23:52       ` Segher Boessenkool
2019-03-30  8:43 ` Heiko Carstens
2019-03-30  8:43   ` Heiko Carstens
2019-03-30 10:30   ` George Spelvin
2019-03-30 10:30     ` George Spelvin
2019-03-30 13:00     ` George Spelvin
2019-03-30 13:00       ` George Spelvin
2019-03-31  0:30     ` Segher Boessenkool
2019-03-31  0:30       ` Segher Boessenkool
2019-03-31  9:23       ` George Spelvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.