From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39543)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Y9XgL-0003w3-OQ
	for qemu-devel@nongnu.org; Fri, 09 Jan 2015 06:24:46 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Y9XgH-00027e-N4
	for qemu-devel@nongnu.org; Fri, 09 Jan 2015 06:24:45 -0500
Received: from mx1.redhat.com ([209.132.183.28]:33045)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Y9XgH-00027W-Fd
	for qemu-devel@nongnu.org; Fri, 09 Jan 2015 06:24:41 -0500
Message-ID: <54AFBA68.9030803@redhat.com>
Date: Fri, 09 Jan 2015 12:24:24 +0100
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1420799253-21911-1-git-send-email-frediano.ziglio@huawei.com>	<54AFAED4.6050900@redhat.com>
	<CAHt6W4eY5oTzMVy07pkkgBtEzS_o_FoqgeNF4wW1xEp_JqHJvw@mail.gmail.com>
In-Reply-To: <CAHt6W4eY5oTzMVy07pkkgBtEzS_o_FoqgeNF4wW1xEp_JqHJvw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] x86_64: optimise muldiv64 for x86_64
	architecture
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Frediano Ziglio <freddy77@gmail.com>
Cc: Frediano Ziglio <frediano.ziglio@huawei.com>, Stefan Hajnoczi <stefanha@redhat.com>, Anthony Liguori <aliguori@amazon.com>, qemu-devel <qemu-devel@nongnu.org>


On 09/01/2015 12:04, Frediano Ziglio wrote:
> 2015-01-09 10:35 GMT+00:00 Paolo Bonzini <pbonzini@redhat.com>:
>>
>>
>> On 09/01/2015 11:27, Frediano Ziglio wrote:
>>>
>>> Signed-off-by: Frediano Ziglio <frediano.ziglio@huawei.com>
>>> ---
>>>  include/qemu-common.h | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>>> index f862214..5366220 100644
>>> --- a/include/qemu-common.h
>>> +++ b/include/qemu-common.h
>>> @@ -370,6 +370,7 @@ static inline uint8_t from_bcd(uint8_t val)
>>>  }
>>>
>>>  /* compute with 96 bit intermediate result: (a*b)/c */
>>> +#ifndef __x86_64__
>>>  static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>>  {
>>>      union {
>>> @@ -392,6 +393,18 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>>      res.l.low = (((rh % c) << 32) + (rl & 0xffffffff)) / c;
>>>      return res.ll;
>>>  }
>>> +#else
>>> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>> +{
>>> +    uint64_t res;
>>> +
>>> +    asm ("mulq %2\n\tdivq %3"
>>> +         : "=a"(res)
>>> +         : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c)
>>> +         : "rdx", "cc");
>>> +    return res;
>>> +}
>>> +#endif
>>>
>>
>> Good idea.  However, if you have __int128, you can just do
>>
>>    return (__int128)a * b / c
>>
>> and the compiler should generate the right code.  Conveniently, there is
>> already CONFIG_INT128 that you can use.
> 
> Well, it works but in our case b <= c, that is a * b / c is always <
> 2^64.

This is not necessarily the case.  Quick grep:

hw/timer/hpet.c:    return (muldiv64(value, HPET_CLK_PERIOD, FS_PER_NS));
hw/timer/hpet.c:    return (muldiv64(value, FS_PER_NS, HPET_CLK_PERIOD));

One of the two must disprove your assertion. :)

But it's true that we expect no overflow.

> This lead to no integer overflow in the last division. However
> the compiler does not know this so it does the entire (a*b) / c
> division which is mainly consists in two integer division instead of
> one (not taking into account that is implemented using a helper
> function).
> 
> I think that I'll write two patches. One implementing using the int128
> as you suggested (which is much easier to read that current one and
> assembly ones) that another for x86_64 optimization.

Right, that's even better.

Out of curiosity, have you seen it in some profiles?

Paolo