All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available
@ 2015-01-09 11:25 Frediano Ziglio
  2015-01-09 11:25 ` [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture Frediano Ziglio
  2015-01-09 12:03 ` [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available Peter Maydell
  0 siblings, 2 replies; 11+ messages in thread
From: Frediano Ziglio @ 2015-01-09 11:25 UTC (permalink / raw)
  To: Paolo Bonzini, Anthony Liguori, Stefan Hajnoczi
  Cc: Frediano Ziglio, qemu-devel

Let compiler do the job to optimise the function.

Signed-off-by: Frediano Ziglio <frediano.ziglio@huawei.com>
---
 include/qemu-common.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index f862214..f3033ae 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -370,6 +370,12 @@ static inline uint8_t from_bcd(uint8_t val)
 }
 
 /* compute with 96 bit intermediate result: (a*b)/c */
+#ifdef CONFIG_INT128
+static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
+{
+    return (__int128)a * b / c;
+}
+#else
 static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 {
     union {
@@ -392,6 +398,7 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
     res.l.low = (((rh % c) << 32) + (rl & 0xffffffff)) / c;
     return res.ll;
 }
+#endif
 
 /* Round number down to multiple */
 #define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
  2015-01-09 11:25 [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available Frediano Ziglio
@ 2015-01-09 11:25 ` Frediano Ziglio
  2015-01-09 11:43   ` Paolo Bonzini
  2015-01-09 12:22   ` Peter Maydell
  2015-01-09 12:03 ` [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available Peter Maydell
  1 sibling, 2 replies; 11+ messages in thread
From: Frediano Ziglio @ 2015-01-09 11:25 UTC (permalink / raw)
  To: Paolo Bonzini, Anthony Liguori, Stefan Hajnoczi
  Cc: Frediano Ziglio, qemu-devel

As this platform can do multiply/divide using 128 bit precision use
these instructions to implement it.

Signed-off-by: Frediano Ziglio <frediano.ziglio@huawei.com>
---
 include/qemu-common.h | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index f3033ae..880659d 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -370,11 +370,23 @@ static inline uint8_t from_bcd(uint8_t val)
 }
 
 /* compute with 96 bit intermediate result: (a*b)/c */
-#ifdef CONFIG_INT128
+#if defined(CONFIG_INT128) && !defined(__x86_64__)
 static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 {
     return (__int128)a * b / c;
 }
+#elif defined(__x86_64__)
+/* Optimised x64 version. This assume that a*b/c fits in 64 bit */
+static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
+{
+    uint64_t res;
+
+    asm ("mulq %2\n\tdivq %3"
+         : "=a"(res)
+         : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c)
+         : "rdx", "cc");
+    return res;
+}
 #else
 static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
  2015-01-09 11:25 ` [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture Frediano Ziglio
@ 2015-01-09 11:43   ` Paolo Bonzini
  2015-01-09 12:09     ` Frediano Ziglio
  2015-01-09 12:22   ` Peter Maydell
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2015-01-09 11:43 UTC (permalink / raw)
  To: Frediano Ziglio, Anthony Liguori, Stefan Hajnoczi
  Cc: Frediano Ziglio, qemu-devel



On 09/01/2015 12:25, Frediano Ziglio wrote:
>  /* compute with 96 bit intermediate result: (a*b)/c */
> -#ifdef CONFIG_INT128
> +#if defined(CONFIG_INT128) && !defined(__x86_64__)
>  static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>  {
>      return (__int128)a * b / c;
>  }
> +#elif defined(__x86_64__)
> +/* Optimised x64 version. This assume that a*b/c fits in 64 bit */
> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
> +{
> +    uint64_t res;
> +
> +    asm ("mulq %2\n\tdivq %3"
> +         : "=a"(res)
> +         : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c)
> +         : "rdx", "cc");
> +    return res;
> +}

Reorder it the other way, and you can simplify the first #if.

I applied patch 1 locally, and will send a pull request once the tree is
thawed.

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available
  2015-01-09 11:25 [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available Frediano Ziglio
  2015-01-09 11:25 ` [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture Frediano Ziglio
@ 2015-01-09 12:03 ` Peter Maydell
  2015-01-09 13:43   ` Frediano Ziglio
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2015-01-09 12:03 UTC (permalink / raw)
  To: Frediano Ziglio
  Cc: Frediano Ziglio, Paolo Bonzini, Stefan Hajnoczi, Anthony Liguori,
	QEMU Developers

On 9 January 2015 at 11:25, Frediano Ziglio <freddy77@gmail.com> wrote:
> Let compiler do the job to optimise the function.
>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@huawei.com>
> ---
>  include/qemu-common.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index f862214..f3033ae 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -370,6 +370,12 @@ static inline uint8_t from_bcd(uint8_t val)
>  }
>
>  /* compute with 96 bit intermediate result: (a*b)/c */
> +#ifdef CONFIG_INT128
> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
> +{
> +    return (__int128)a * b / c;

Wrong type -- this should be __int128_t, which is what we check for
in our configure test. (Or __uint128_t I suppose.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
  2015-01-09 11:43   ` Paolo Bonzini
@ 2015-01-09 12:09     ` Frediano Ziglio
  0 siblings, 0 replies; 11+ messages in thread
From: Frediano Ziglio @ 2015-01-09 12:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Frediano Ziglio, Stefan Hajnoczi, Anthony Liguori, qemu-devel

2015-01-09 11:43 GMT+00:00 Paolo Bonzini <pbonzini@redhat.com>:
>
>
> On 09/01/2015 12:25, Frediano Ziglio wrote:
>>  /* compute with 96 bit intermediate result: (a*b)/c */
>> -#ifdef CONFIG_INT128
>> +#if defined(CONFIG_INT128) && !defined(__x86_64__)
>>  static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>  {
>>      return (__int128)a * b / c;
>>  }
>> +#elif defined(__x86_64__)
>> +/* Optimised x64 version. This assume that a*b/c fits in 64 bit */
>> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>> +{
>> +    uint64_t res;
>> +
>> +    asm ("mulq %2\n\tdivq %3"
>> +         : "=a"(res)
>> +         : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c)
>> +         : "rdx", "cc");
>> +    return res;
>> +}
>
> Reorder it the other way, and you can simplify the first #if.
>

Yes, I was however thinking about people reading code. The int128
version is much easier to read so I put it first.

> I applied patch 1 locally, and will send a pull request once the tree is
> thawed.
>
> Paolo

Frediano

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
  2015-01-09 11:25 ` [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture Frediano Ziglio
  2015-01-09 11:43   ` Paolo Bonzini
@ 2015-01-09 12:22   ` Peter Maydell
  2015-01-09 15:41     ` Frediano Ziglio
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2015-01-09 12:22 UTC (permalink / raw)
  To: Frediano Ziglio
  Cc: Frediano Ziglio, Paolo Bonzini, Stefan Hajnoczi, Anthony Liguori,
	QEMU Developers

On 9 January 2015 at 11:25, Frediano Ziglio <freddy77@gmail.com> wrote:
> As this platform can do multiply/divide using 128 bit precision use
> these instructions to implement it.
>
> Signed-off-by: Frediano Ziglio <frediano.ziglio@huawei.com>
> ---
>  include/qemu-common.h | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index f3033ae..880659d 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -370,11 +370,23 @@ static inline uint8_t from_bcd(uint8_t val)
>  }
>
>  /* compute with 96 bit intermediate result: (a*b)/c */
> -#ifdef CONFIG_INT128
> +#if defined(CONFIG_INT128) && !defined(__x86_64__)
>  static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>  {
>      return (__int128)a * b / c;
>  }
> +#elif defined(__x86_64__)
> +/* Optimised x64 version. This assume that a*b/c fits in 64 bit */

This assumption isn't necessarily true, and this implementation
will dump core on overflow. For instance:

Inputs: a = 8000000000000000 b = 80000000 c = 1
fn muldiv64 result 0
fn muldiv64_with_int128 result 0
fn muldiv64_with_uint128 result 0
Floating point exception (core dumped)

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available
  2015-01-09 12:03 ` [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available Peter Maydell
@ 2015-01-09 13:43   ` Frediano Ziglio
  0 siblings, 0 replies; 11+ messages in thread
From: Frediano Ziglio @ 2015-01-09 13:43 UTC (permalink / raw)
  To: Paolo Bonzini, Anthony Liguori, Stefan Hajnoczi
  Cc: Frediano Ziglio, qemu-devel

Let compiler do the job to optimise the function.

Signed-off-by: Frediano Ziglio <frediano.ziglio@huawei.com>
---
 include/qemu-common.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index f862214..f3033ae 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -370,6 +370,12 @@ static inline uint8_t from_bcd(uint8_t val)
 }
 
 /* compute with 96 bit intermediate result: (a*b)/c */
+#ifdef CONFIG_INT128
+static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
+{
+    return (__uint128_t)a * b / c;
+}
+#else
 static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 {
     union {
@@ -392,6 +398,7 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
     res.l.low = (((rh % c) << 32) + (rl & 0xffffffff)) / c;
     return res.ll;
 }
+#endif
 
 /* Round number down to multiple */
 #define QEMU_ALIGN_DOWN(n, m) ((n) / (m) * (m))
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
  2015-01-09 12:22   ` Peter Maydell
@ 2015-01-09 15:41     ` Frediano Ziglio
  2015-01-09 15:52       ` Peter Maydell
  0 siblings, 1 reply; 11+ messages in thread
From: Frediano Ziglio @ 2015-01-09 15:41 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Frediano Ziglio, Paolo Bonzini, Stefan Hajnoczi, Anthony Liguori,
	QEMU Developers

2015-01-09 12:22 GMT+00:00 Peter Maydell <peter.maydell@linaro.org>:
> On 9 January 2015 at 11:25, Frediano Ziglio <freddy77@gmail.com> wrote:
>> As this platform can do multiply/divide using 128 bit precision use
>> these instructions to implement it.
>>
>> Signed-off-by: Frediano Ziglio <frediano.ziglio@huawei.com>
>> ---
>>  include/qemu-common.h | 14 +++++++++++++-
>>  1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>> index f3033ae..880659d 100644
>> --- a/include/qemu-common.h
>> +++ b/include/qemu-common.h
>> @@ -370,11 +370,23 @@ static inline uint8_t from_bcd(uint8_t val)
>>  }
>>
>>  /* compute with 96 bit intermediate result: (a*b)/c */
>> -#ifdef CONFIG_INT128
>> +#if defined(CONFIG_INT128) && !defined(__x86_64__)
>>  static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>  {
>>      return (__int128)a * b / c;
>>  }
>> +#elif defined(__x86_64__)
>> +/* Optimised x64 version. This assume that a*b/c fits in 64 bit */
>
> This assumption isn't necessarily true, and this implementation
> will dump core on overflow. For instance:
>
> Inputs: a = 8000000000000000 b = 80000000 c = 1
> fn muldiv64 result 0
> fn muldiv64_with_int128 result 0
> fn muldiv64_with_uint128 result 0
> Floating point exception (core dumped)
>
> -- PMM

Yes, I know, it was meant to be a precondition, not a math rule. Doing
some grep I'm not really sure this is valid in all cases however I
would ask if the truncation is handled correctly. Surely in some cases
the call to this function is not needed like in "muldiv64(1, tks,
usb_bit_time);" or in "muldiv64((int16_t) s->rtc.comp, 1000, 0x8000);"

Frediano

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
  2015-01-09 15:41     ` Frediano Ziglio
@ 2015-01-09 15:52       ` Peter Maydell
  2015-01-09 16:08         ` Frediano Ziglio
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2015-01-09 15:52 UTC (permalink / raw)
  To: Frediano Ziglio
  Cc: Frediano Ziglio, Paolo Bonzini, Stefan Hajnoczi, Anthony Liguori,
	QEMU Developers

On 9 January 2015 at 15:41, Frediano Ziglio <freddy77@gmail.com> wrote:
> 2015-01-09 12:22 GMT+00:00 Peter Maydell <peter.maydell@linaro.org>:
>>> +/* Optimised x64 version. This assume that a*b/c fits in 64 bit */
>>
>> This assumption isn't necessarily true, and this implementation
>> will dump core on overflow.

> Yes, I know, it was meant to be a precondition, not a math rule. Doing
> some grep I'm not really sure this is valid in all cases however I
> would ask if the truncation is handled correctly. Surely in some cases
> the call to this function is not needed like in "muldiv64(1, tks,
> usb_bit_time);" or in "muldiv64((int16_t) s->rtc.comp, 1000, 0x8000);"

Sure, there are some cases where we know the calculation won't overflow,
but there are also cases where we can't be so sure, and if the parameters
can be controlled by the guest then we absolutely can't allow the guest
to cause QEMU to SIGFPE.

Incidentally, in the examples you quote gcc is generally able
(because the function is inline) to do a better job because of
some of the constants involved: for instance with
  "muldiv64((int16_t) s->rtc.comp, 1000, 0x8000);"
it generates a completely inline insn sequence with one mul and
no div insns.

So I think at this point I come back to an earlier question:
do you have profiling results showing this function to be
a performance problem on a real workload? If not, it just doesn't
seem to me to be worth the risk and the maintenance burden to
add a native x86 assembly version.

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
  2015-01-09 15:52       ` Peter Maydell
@ 2015-01-09 16:08         ` Frediano Ziglio
  2015-01-09 16:10           ` Peter Maydell
  0 siblings, 1 reply; 11+ messages in thread
From: Frediano Ziglio @ 2015-01-09 16:08 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Frediano Ziglio, Paolo Bonzini, Stefan Hajnoczi, Anthony Liguori,
	QEMU Developers

2015-01-09 15:52 GMT+00:00 Peter Maydell <peter.maydell@linaro.org>:
> On 9 January 2015 at 15:41, Frediano Ziglio <freddy77@gmail.com> wrote:
>> 2015-01-09 12:22 GMT+00:00 Peter Maydell <peter.maydell@linaro.org>:
>>>> +/* Optimised x64 version. This assume that a*b/c fits in 64 bit */
>>>
>>> This assumption isn't necessarily true, and this implementation
>>> will dump core on overflow.
>
>> Yes, I know, it was meant to be a precondition, not a math rule. Doing
>> some grep I'm not really sure this is valid in all cases however I
>> would ask if the truncation is handled correctly. Surely in some cases
>> the call to this function is not needed like in "muldiv64(1, tks,
>> usb_bit_time);" or in "muldiv64((int16_t) s->rtc.comp, 1000, 0x8000);"
>
> Sure, there are some cases where we know the calculation won't overflow,
> but there are also cases where we can't be so sure, and if the parameters
> can be controlled by the guest then we absolutely can't allow the guest
> to cause QEMU to SIGFPE.
>
> Incidentally, in the examples you quote gcc is generally able
> (because the function is inline) to do a better job because of
> some of the constants involved: for instance with
>   "muldiv64((int16_t) s->rtc.comp, 1000, 0x8000);"
> it generates a completely inline insn sequence with one mul and
> no div insns.
>
> So I think at this point I come back to an earlier question:
> do you have profiling results showing this function to be
> a performance problem on a real workload? If not, it just doesn't
> seem to me to be worth the risk and the maintenance burden to
> add a native x86 assembly version.
>
> -- PMM

I agree (after some digging) we are not sure we won't get that
overflow. Agree to drop the second patch. However I would retain the
first. Compiler can use it to optimize much easier. For instance if
compiler understand that the multiplication fits into a 64 bit can
decide to avoid the 128 bit operation easily, not so easy with all
these shift, multiply, division and union structure.

I didn't do any profiling.

Frediano

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture
  2015-01-09 16:08         ` Frediano Ziglio
@ 2015-01-09 16:10           ` Peter Maydell
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2015-01-09 16:10 UTC (permalink / raw)
  To: Frediano Ziglio
  Cc: Frediano Ziglio, Paolo Bonzini, Stefan Hajnoczi, Anthony Liguori,
	QEMU Developers

On 9 January 2015 at 16:08, Frediano Ziglio <freddy77@gmail.com> wrote:
> I agree (after some digging) we are not sure we won't get that
> overflow. Agree to drop the second patch. However I would retain the
> first. Compiler can use it to optimize much easier. For instance if
> compiler understand that the multiplication fits into a 64 bit can
> decide to avoid the 128 bit operation easily, not so easy with all
> these shift, multiply, division and union structure.

Yes, the uint128_t patch is a good idea.

-- PMM

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-01-09 16:10 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-09 11:25 [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available Frediano Ziglio
2015-01-09 11:25 ` [Qemu-devel] [PATCH 2/2] qemu-common.h: optimise muldiv64 for x86_64 architecture Frediano Ziglio
2015-01-09 11:43   ` Paolo Bonzini
2015-01-09 12:09     ` Frediano Ziglio
2015-01-09 12:22   ` Peter Maydell
2015-01-09 15:41     ` Frediano Ziglio
2015-01-09 15:52       ` Peter Maydell
2015-01-09 16:08         ` Frediano Ziglio
2015-01-09 16:10           ` Peter Maydell
2015-01-09 12:03 ` [Qemu-devel] [PATCH 1/2] qemu-common.h: optimise muldiv64 if int128 is available Peter Maydell
2015-01-09 13:43   ` Frediano Ziglio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.