From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39543) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y9XgL-0003w3-OQ for qemu-devel@nongnu.org; Fri, 09 Jan 2015 06:24:46 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y9XgH-00027e-N4 for qemu-devel@nongnu.org; Fri, 09 Jan 2015 06:24:45 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33045) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y9XgH-00027W-Fd for qemu-devel@nongnu.org; Fri, 09 Jan 2015 06:24:41 -0500 Message-ID: <54AFBA68.9030803@redhat.com> Date: Fri, 09 Jan 2015 12:24:24 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1420799253-21911-1-git-send-email-frediano.ziglio@huawei.com> <54AFAED4.6050900@redhat.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] x86_64: optimise muldiv64 for x86_64 architecture List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Frediano Ziglio Cc: Frediano Ziglio , Stefan Hajnoczi , Anthony Liguori , qemu-devel On 09/01/2015 12:04, Frediano Ziglio wrote: > 2015-01-09 10:35 GMT+00:00 Paolo Bonzini : >> >> >> On 09/01/2015 11:27, Frediano Ziglio wrote: >>> >>> Signed-off-by: Frediano Ziglio >>> --- >>> include/qemu-common.h | 13 +++++++++++++ >>> 1 file changed, 13 insertions(+) >>> >>> diff --git a/include/qemu-common.h b/include/qemu-common.h >>> index f862214..5366220 100644 >>> --- a/include/qemu-common.h >>> +++ b/include/qemu-common.h >>> @@ -370,6 +370,7 @@ static inline uint8_t from_bcd(uint8_t val) >>> } >>> >>> /* compute with 96 bit intermediate result: (a*b)/c */ >>> +#ifndef __x86_64__ >>> static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c) >>> { >>> union { >>> @@ -392,6 +393,18 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c) >>> res.l.low = (((rh % c) << 32) + (rl & 0xffffffff)) / c; >>> return res.ll; >>> } >>> +#else >>> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c) >>> +{ >>> + uint64_t res; >>> + >>> + asm ("mulq %2\n\tdivq %3" >>> + : "=a"(res) >>> + : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c) >>> + : "rdx", "cc"); >>> + return res; >>> +} >>> +#endif >>> >> >> Good idea. However, if you have __int128, you can just do >> >> return (__int128)a * b / c >> >> and the compiler should generate the right code. Conveniently, there is >> already CONFIG_INT128 that you can use. > > Well, it works but in our case b <= c, that is a * b / c is always < > 2^64. This is not necessarily the case. Quick grep: hw/timer/hpet.c: return (muldiv64(value, HPET_CLK_PERIOD, FS_PER_NS)); hw/timer/hpet.c: return (muldiv64(value, FS_PER_NS, HPET_CLK_PERIOD)); One of the two must disprove your assertion. :) But it's true that we expect no overflow. > This lead to no integer overflow in the last division. However > the compiler does not know this so it does the entire (a*b) / c > division which is mainly consists in two integer division instead of > one (not taking into account that is implemented using a helper > function). > > I think that I'll write two patches. One implementing using the int128 > as you suggested (which is much easier to read that current one and > assembly ones) that another for x86_64 optimization. Right, that's even better. Out of curiosity, have you seen it in some profiles? Paolo