qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* target/ppc: bug in optimised vsl/vsr implementation?
@ 2019-09-26 18:04 Mark Cave-Ayland
  2019-09-28 17:45 ` Aleksandar Markovic
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Cave-Ayland @ 2019-09-26 18:04 UTC (permalink / raw)
  To: qemu-devel, qemu-ppc; +Cc: stefan.brankovic, Paul Clarke

As part of the investigation into the DFP number issue reported at
https://bugs.launchpad.net/qemu/+bug/1841990 it appears that there may also be a bug
introduced by the new optimised vsl/vsr implementation:

commit 4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
Author: Stefan Brankovic <stefan.brankovic@rt-rk.com>
Date: Mon Jul 15 16:22:48 2019 +0200

    target/ppc: Optimize emulation of vsl and vsr instructions

    Optimization of altivec instructions vsl and vsr(Vector Shift Left/Rigt).
    Perform shift operation (left and right respectively) on 128 bit value of
    register vA by value specified in bits 125-127 of register vB. Lowest 3
    bits in each byte element of register vB must be identical or result is
    undefined.

    For vsl instruction, the first step is bits 125-127 of register vB have
    to be saved in variable sh. Then, the highest sh bits of the lower
    doubleword element of register vA are saved in variable shifted,
    in order not to lose those bits when shift operation is performed on
    the lower doubleword element of register vA, which is the next
    step. After shifting the lower doubleword element shift operation
    is performed on higher doubleword element of vA, with replacement of
    the lowest sh bits(that are now 0) with bits saved in shifted.

    For vsr instruction, firstly, the bits 125-127 of register vB have
    to be saved in variable sh. Then, the lowest sh bits of the higher
    doubleword element of register vA are saved in variable shifted,
    in odred not to lose those bits when the shift operation is
    performed on the higher doubleword element of register vA, which is
    the next step. After shifting higher doubleword element, shift operation
    is performed on lower doubleword element of vA, with replacement of
    highest sh bits(that are now 0) with bits saved in shifted.

    Signed-off-by: Stefan Brankovic <stefan.brankovic@rt-rk.com>
    Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
    Message-Id: <1563200574-11098-3-git-send-email-stefan.brankovic@rt-rk.com>
    Signed-off-by: David Gibson <david@gibson.dropbear.id.au>


Reverting 4e6d0920e7547e6af4bbac5ffe9adfe6ea621822 allows the test case in the bug
report to pass once again. Stefan, are you able to take a look at this?


ATB,

Mark.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-09-26 18:04 target/ppc: bug in optimised vsl/vsr implementation? Mark Cave-Ayland
@ 2019-09-28 17:45 ` Aleksandar Markovic
  2019-09-28 22:17   ` Aleksandar Markovic
                     ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Aleksandar Markovic @ 2019-09-28 17:45 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: stefan.brankovic, qemu-ppc, Paul Clarke, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2608 bytes --]

26.09.2019. 20.14, "Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk> је
написао/ла:
>
> As part of the investigation into the DFP number issue reported at
> https://bugs.launchpad.net/qemu/+bug/1841990 it appears that there may
also be a bug
> introduced by the new optimised vsl/vsr implementation:
>
> commit 4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
> Author: Stefan Brankovic <stefan.brankovic@rt-rk.com>
> Date: Mon Jul 15 16:22:48 2019 +0200
>
>     target/ppc: Optimize emulation of vsl and vsr instructions
>
>

(sorry in advance if the format of this message looks odd, I have some
problems with mail settings related to recent qemu-devel mailer settings
changes; I will adjust my mail settings in next few days)

Mark and Paul (and Stefan),

Thanks for spotting this and pinpointing the culprit commit. I guess Stefan
is going to respond soon, but, in the meantime, I took a look at the commit
in question:

https://github.com/qemu/qemu/commit/4e6d0920e7547e6af4bbac5ffe9adfe6ea621822

I don't have at the moment any dev/test environment handy, but I did manual
inspection of the code, and here is what I found (in order of importance,
perceived by me):

1. The code will not work correctly if the shift ammount (variable 'sh') is
0. This is because, in that case, one of succeeding invocations of TCG
shift functions will be required to shift a 64-bit TCG variable by 64 bits,
and the result of such TCG operation is undefined (shift amount must be 63
or less) - see https://github.com/qemu/qemu/blob/master/tcg/README.

2. Variable naming is better in the old helper than in the new translator.
In that light, I would advise Stefan to change 'sh' to 'shift', and
'shifted' to 'carry'.

3. Lines

tcg_gen_andi_i64(shifted, shifted, 0x7fULL);

and

tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);

appear to be spurious (albait in a harmless way). Therefore, they should be
deleted, or, alternatevely, a justification for them should be provided.

4. In the commit message, variable names were used without quotation mark,
resulting in puzzling and unclear wording.

5. (a question for Mark) After all recent changes, does get_avr64(..., ...,
true) always (for any endian configuration) return the "high" half of an
Altivec register, and get_avr64(..., ..., false) the "low" one?

Given all these circumstances, perhaps the most reasonable solution would
be to revert the commit in question, and allow Stefan enough dev and test
time to hopefully submit a new, better, version later on.

Sincerely,
Aleksandar

[-- Attachment #2: Type: text/html, Size: 3357 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-09-28 17:45 ` Aleksandar Markovic
@ 2019-09-28 22:17   ` Aleksandar Markovic
  2019-09-30 14:34     ` Paul Clarke
  2019-09-30 14:37   ` Aleksandar Markovic
  2019-10-01 18:24   ` Mark Cave-Ayland
  2 siblings, 1 reply; 13+ messages in thread
From: Aleksandar Markovic @ 2019-09-28 22:17 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: stefan.brankovic, qemu-ppc, Paul Clarke, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3191 bytes --]

28.09.2019. 19.45, "Aleksandar Markovic" <aleksandar.m.mail@gmail.com> је
написао/ла:
>
>
> 26.09.2019. 20.14, "Mark Cave-Ayland" <mark.cave-ayland@ilande.co.uk> је
написао/ла:
> >
> > As part of the investigation into the DFP number issue reported at
> > https://bugs.launchpad.net/qemu/+bug/1841990 it appears that there may
also be a bug
> > introduced by the new optimised vsl/vsr implementation:
> >
> > commit 4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
> > Author: Stefan Brankovic <stefan.brankovic@rt-rk.com>
> > Date: Mon Jul 15 16:22:48 2019 +0200
> >
> >     target/ppc: Optimize emulation of vsl and vsr instructions
> >
> >
>
> (sorry in advance if the format of this message looks odd, I have some
problems with mail settings related to recent qemu-devel mailer settings
changes; I will adjust my mail settings in next few days)
>
> Mark and Paul (and Stefan),
>
> Thanks for spotting this and pinpointing the culprit commit. I guess
Stefan is going to respond soon, but, in the meantime, I took a look at the
commit in question:
>
>
https://github.com/qemu/qemu/commit/4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
>
> I don't have at the moment any dev/test environment handy, but I did
manual inspection of the code, and here is what I found (in order of
importance, perceived by me):
>
> 1. The code will not work correctly if the shift ammount (variable 'sh')
is 0. This is because, in that case, one of succeeding invocations of TCG
shift functions will be required to shift a 64-bit TCG variable by 64 bits,
and the result of such TCG operation is undefined (shift amount must be 63
or less) - see https://github.com/qemu/qemu/blob/master/tcg/README.
>
> 2. Variable naming is better in the old helper than in the new
translator. In that light, I would advise Stefan to change 'sh' to 'shift',
and 'shifted' to 'carry'.
>
> 3. Lines
>
> tcg_gen_andi_i64(shifted, shifted, 0x7fULL);
>
> and
>
> tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);
>
> appear to be spurious (albait in a harmless way). Therefore, they should
be deleted, or, alternatevely, a justification for them should be provided.
>
> 4. In the commit message, variable names were used without quotation
mark, resulting in puzzling and unclear wording.
>
> 5. (a question for Mark) After all recent changes, does get_avr64(...,
..., true) always (for any endian configuration) return the "high" half of
an Altivec register, and get_avr64(..., ..., false) the "low" one?
>

One more hint: variables 'avrA' and 'avrB' can be a single variable, since
there is no moment during execution where both are needed; the same for
'tmp' and 'shifted'.

Also, check on the hardware the behavior listed as 'undefined' for vsl/vsr
in the docs - even though it is tehnically irrelevant, I am courious
whether the old or the new (or none of them) solution match the hardware.

> Given all these circumstances, perhaps the most reasonable solution would
be to revert the commit in question, and allow Stefan enough dev and test
time to hopefully submit a new, better, version later on.
>
> Sincerely,
> Aleksandar
>

[-- Attachment #2: Type: text/html, Size: 4128 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re:  Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-09-28 22:17   ` Aleksandar Markovic
@ 2019-09-30 14:34     ` Paul Clarke
  2019-09-30 14:53       ` Aleksandar Markovic
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Clarke @ 2019-09-30 14:34 UTC (permalink / raw)
  To: Aleksandar Markovic, Mark Cave-Ayland
  Cc: stefan.brankovic, qemu-ppc, qemu-devel

On 9/28/19 5:17 PM, Aleksandar Markovic wrote:
> Also, check on the hardware the behavior listed as 'undefined' for vsl/vsr
> in the docs - even though it is tehnically irrelevant, I am courious
> whether the old or the new (or none of them) solution match the hardware.

There does appear to be some odd behavior when one strays into the undefined.  For example:
source vector: 0102030405060708090a0b0c0d0e0f10
shift  vector: 01020101010101010101010101010101
after vsl:     020806080a0c0e10121416181a1c1e20
...this appears to use the byte-respective shift values

using vsr with that result and the same shift vector:
after vsr:     0182030405060708090a0b0c0d0e0f10
I expected to get back a result matching the source vector, but somehow, an extra bit got set.

It would probably take some more thorough investigation to map out the undefined behavior, but I doubt there's any value to that.

PC


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-09-28 17:45 ` Aleksandar Markovic
  2019-09-28 22:17   ` Aleksandar Markovic
@ 2019-09-30 14:37   ` Aleksandar Markovic
  2019-10-01 18:24   ` Mark Cave-Ayland
  2 siblings, 0 replies; 13+ messages in thread
From: Aleksandar Markovic @ 2019-09-30 14:37 UTC (permalink / raw)
  To: Mark Cave-Ayland; +Cc: stefan.brankovic, qemu-ppc, Paul Clarke, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 879 bytes --]

> 5. (a question for Mark) After all recent changes, does get_avr64(...,
..., true) always (for any endian configuration) return the "high" half of
an Altivec register, and get_avr64(..., ..., false) the "low" one?
>
> Given all these circumstances, perhaps the most reasonable solution would
be to revert the commit in question, and allow Stefan enough dev and test
time to hopefully submit a new, better, version later on.
>

Mark, can you verify please that this is true? The thing is, 'vsl/vsr'
belong to the group of SIMD instructions where the distinction between
'high' and 'low' 64-bit halves of the source and destination registers is
important (as opposed to the majority of 'regular' SIMD instructions, like
vector addition, vector max/min, etc., that act only as parallel operations
on corresdponding data elements).

Regards, Aleksandar

> Sincerely,
> Aleksandar
>

[-- Attachment #2: Type: text/html, Size: 1062 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-09-30 14:34     ` Paul Clarke
@ 2019-09-30 14:53       ` Aleksandar Markovic
  0 siblings, 0 replies; 13+ messages in thread
From: Aleksandar Markovic @ 2019-09-30 14:53 UTC (permalink / raw)
  To: Paul Clarke; +Cc: stefan.brankovic, Mark Cave-Ayland, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1352 bytes --]

30.09.2019. 16.35, "Paul Clarke" <pc@us.ibm.com> је написао/ла:
>
> On 9/28/19 5:17 PM, Aleksandar Markovic wrote:
> > Also, check on the hardware the behavior listed as 'undefined' for
vsl/vsr
> > in the docs - even though it is tehnically irrelevant, I am courious
> > whether the old or the new (or none of them) solution match the
hardware.
>
> There does appear to be some odd behavior when one strays into the
undefined.  For example:
> source vector: 0102030405060708090a0b0c0d0e0f10
> shift  vector: 01020101010101010101010101010101
> after vsl:     020806080a0c0e10121416181a1c1e20
> ...this appears to use the byte-respective shift values
>
> using vsr with that result and the same shift vector:
> after vsr:     0182030405060708090a0b0c0d0e0f10
> I expected to get back a result matching the source vector, but somehow,
an extra bit got set.
>
> It would probably take some more thorough investigation to map out the
undefined behavior, but I doubt there's any value to that.
>

Absolutely agree. I thought if the 'undefined' behavior is something
obviously simple, we could try to match it, assuming also that it remains
constant across all implementations. But, this behaviour is not a simple
one, so, imho, let's leave 'undefined' undefined.

Thanks for a nice experiment!

Aleksandar

> PC

[-- Attachment #2: Type: text/html, Size: 1662 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-09-28 17:45 ` Aleksandar Markovic
  2019-09-28 22:17   ` Aleksandar Markovic
  2019-09-30 14:37   ` Aleksandar Markovic
@ 2019-10-01 18:24   ` Mark Cave-Ayland
  2019-10-02 14:08     ` Stefan Brankovic
  2019-10-02 17:38     ` Alex Bennée
  2 siblings, 2 replies; 13+ messages in thread
From: Mark Cave-Ayland @ 2019-10-01 18:24 UTC (permalink / raw)
  To: Aleksandar Markovic; +Cc: stefan.brankovic, qemu-ppc, Paul Clarke, qemu-devel

On 28/09/2019 18:45, Aleksandar Markovic wrote:

Hi Aleksandar,

Thanks for taking a look at this!

> Mark and Paul (and Stefan),
> 
> Thanks for spotting this and pinpointing the culprit commit. I guess Stefan is going
> to respond soon, but, in the meantime, I took a look at the commit in question:
> 
> https://github.com/qemu/qemu/commit/4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
> 
> I don't have at the moment any dev/test environment handy, but I did manual
> inspection of the code, and here is what I found (in order of importance, perceived
> by me):
> 
> 1. The code will not work correctly if the shift ammount (variable 'sh') is 0. This
> is because, in that case, one of succeeding invocations of TCG shift functions will
> be required to shift a 64-bit TCG variable by 64 bits, and the result of such TCG
> operation is undefined (shift amount must be 63 or less) - see
> https://github.com/qemu/qemu/blob/master/tcg/README.

Yes I think you're right here - the old helper got around this by doing an explicit
copy from a to r if the shift value is zero. In fact the case that Paul reported is
exactly this:

   vsl VRT, VRA, VRB

=> 0x100006e0 <vec_slq+132>: vsl v0,v0,v1
(gdb) p $vr0.uint128
$21 = 0x10111213141516172021222324252650
(gdb) p $vr1.uint128
$22 = 0x0
(gdb) stepi
0x00000000100006e4 in vec_slq ()
1: x/i $pc
=> 0x100006e4 <vec_slq+136>: xxlor vs0,vs32,vs32
(gdb) p $vr0.uint128
$23 = 0x10111213141516172021222324252650

I guess the solution is check for sh == 0 and if this is the case, execute a copy
instead.

> 2. Variable naming is better in the old helper than in the new translator. In that
> light, I would advise Stefan to change 'sh' to 'shift', and 'shifted' to 'carry'.

It looks like the name "sh" comes from the ISA documentation, so whilst it's a little
tricky to compare with the previous implementation it does make sense when comparing
with the algorithm shown there. Note: this implementation also drops the check for
each byte of VRB having the same shift value - should we care about this?

> 3. Lines
> 
> tcg_gen_andi_i64(shifted, shifted, 0x7fULL);
> 
> and
> 
> tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);
> 
> appear to be spurious (albait in a harmless way). Therefore, they should be deleted,
> or, alternatevely, a justification for them should be provided.

I'm not sure why they are needed either - there's certainly no mention of it in the
ISA documentation. Stefan?

> 4. In the commit message, variable names were used without quotation mark, resulting
> in puzzling and unclear wording.
> 
> 5. (a question for Mark) After all recent changes, does get_avr64(..., ..., true)
> always (for any endian configuration) return the "high" half of an Altivec register,
> and get_avr64(..., ..., false) the "low" one?

Yes - the new functions always return the MSB (high) and LSB (low) correctly
regardless of host endian.

> Given all these circumstances, perhaps the most reasonable solution would be to
> revert the commit in question, and allow Stefan enough dev and test time to hopefully
> submit a new, better, version later on.

Given that it has been broken for 3 months now, I don't think we're in any major rush
to revert ASAP. I'd prefer to give Stefan a bit more time first since he does report
some substantial speed improvements from these new implementations.


ATB,

Mark.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-10-01 18:24   ` Mark Cave-Ayland
@ 2019-10-02 14:08     ` Stefan Brankovic
  2019-10-03 11:11       ` Stefan Brankovic
  2019-10-02 17:38     ` Alex Bennée
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Brankovic @ 2019-10-02 14:08 UTC (permalink / raw)
  To: Mark Cave-Ayland, Aleksandar Markovic; +Cc: qemu-ppc, Paul Clarke, qemu-devel

Hi Mark,

Thank you for reporting this bug. I was away from office for couple of 
days, so that's why I am answering you a bit late, sorry about that. I 
will start working on a solution and try to fix this problem in next 
couple of days.

On 1.10.19. 20:24, Mark Cave-Ayland wrote:
> On 28/09/2019 18:45, Aleksandar Markovic wrote:
>
> Hi Aleksandar,
>
> Thanks for taking a look at this!
>
>> Mark and Paul (and Stefan),
>>
>> Thanks for spotting this and pinpointing the culprit commit. I guess Stefan is going
>> to respond soon, but, in the meantime, I took a look at the commit in question:
>>
>> https://github.com/qemu/qemu/commit/4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
>>
>> I don't have at the moment any dev/test environment handy, but I did manual
>> inspection of the code, and here is what I found (in order of importance, perceived
>> by me):
>>
>> 1. The code will not work correctly if the shift ammount (variable 'sh') is 0. This
>> is because, in that case, one of succeeding invocations of TCG shift functions will
>> be required to shift a 64-bit TCG variable by 64 bits, and the result of such TCG
>> operation is undefined (shift amount must be 63 or less) - see
>> https://github.com/qemu/qemu/blob/master/tcg/README.
> Yes I think you're right here - the old helper got around this by doing an explicit
> copy from a to r if the shift value is zero. In fact the case that Paul reported is
> exactly this:
>
>     vsl VRT, VRA, VRB
>
> => 0x100006e0 <vec_slq+132>: vsl v0,v0,v1
> (gdb) p $vr0.uint128
> $21 = 0x10111213141516172021222324252650
> (gdb) p $vr1.uint128
> $22 = 0x0
> (gdb) stepi
> 0x00000000100006e4 in vec_slq ()
> 1: x/i $pc
> => 0x100006e4 <vec_slq+136>: xxlor vs0,vs32,vs32
> (gdb) p $vr0.uint128
> $23 = 0x10111213141516172021222324252650
>
> I guess the solution is check for sh == 0 and if this is the case, execute a copy
> instead.
I agree with you. This will be changed in upcoming patch.
>
>> 2. Variable naming is better in the old helper than in the new translator. In that
>> light, I would advise Stefan to change 'sh' to 'shift', and 'shifted' to 'carry'.
> It looks like the name "sh" comes from the ISA documentation, so whilst it's a little
> tricky to compare with the previous implementation it does make sense when comparing
> with the algorithm shown there. Note: this implementation also drops the check for
> each byte of VRB having the same shift value - should we care about this?

"sh" is taken from the ISA documentation, so I would leave that as it is 
now, but I can change some other variable names to be consistent with 
previous implementation (e.g. "shifted" -> "carry").

I don't think that we should check each byte of VRB, because we care 
only about "defined" behavior. If shift values doesn't match, result is 
"undefined" so it doesn't matter what is inside resulting register.

>> 3. Lines
>>
>> tcg_gen_andi_i64(shifted, shifted, 0x7fULL);
>>
>> and
>>
>> tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);
>>
>> appear to be spurious (albait in a harmless way). Therefore, they should be deleted,
>> or, alternatevely, a justification for them should be provided.
> I'm not sure why they are needed either - there's certainly no mention of it in the
> ISA documentation. Stefan?
This will be removed in upcoming patch.
>
>> 4. In the commit message, variable names were used without quotation mark, resulting
>> in puzzling and unclear wording.
>>
>> 5. (a question for Mark) After all recent changes, does get_avr64(..., ..., true)
>> always (for any endian configuration) return the "high" half of an Altivec register,
>> and get_avr64(..., ..., false) the "low" one?
> Yes - the new functions always return the MSB (high) and LSB (low) correctly
> regardless of host endian.
>
>> Given all these circumstances, perhaps the most reasonable solution would be to
>> revert the commit in question, and allow Stefan enough dev and test time to hopefully
>> submit a new, better, version later on.
> Given that it has been broken for 3 months now, I don't think we're in any major rush
> to revert ASAP. I'd prefer to give Stefan a bit more time first since he does report
> some substantial speed improvements from these new implementations.
>
>
> ATB,
>
> Mark.

Best Regards,

Stefan



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-10-01 18:24   ` Mark Cave-Ayland
  2019-10-02 14:08     ` Stefan Brankovic
@ 2019-10-02 17:38     ` Alex Bennée
  2019-10-02 19:40       ` Richard Henderson
  1 sibling, 1 reply; 13+ messages in thread
From: Alex Bennée @ 2019-10-02 17:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: stefan.brankovic, qemu-ppc, Paul Clarke, Aleksandar Markovic


Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> writes:

> On 28/09/2019 18:45, Aleksandar Markovic wrote:
>
> Hi Aleksandar,
>
> Thanks for taking a look at this!
>
>> Mark and Paul (and Stefan),
>>
>> Thanks for spotting this and pinpointing the culprit commit. I guess Stefan is going
>> to respond soon, but, in the meantime, I took a look at the commit in question:
>>
>> https://github.com/qemu/qemu/commit/4e6d0920e7547e6af4bbac5ffe9adfe6ea621822
>>
>> I don't have at the moment any dev/test environment handy, but I did manual
>> inspection of the code, and here is what I found (in order of importance, perceived
>> by me):
>>
<snip>
>
>> Given all these circumstances, perhaps the most reasonable solution would be to
>> revert the commit in question, and allow Stefan enough dev and test time to hopefully
>> submit a new, better, version later on.
>
> Given that it has been broken for 3 months now, I don't think we're in any major rush
> to revert ASAP. I'd prefer to give Stefan a bit more time first since he does report
> some substantial speed improvements from these new implementations.

Is the denbcdq instruction exposed in any standard float operations?
Once this is fixed it would be worth adding a testcase (either ppc64
specific or multiarch) so protect it from regression in the future.

>
>
> ATB,
>
> Mark.


--
Alex Bennée


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-10-02 17:38     ` Alex Bennée
@ 2019-10-02 19:40       ` Richard Henderson
  2019-10-02 19:55         ` Paul Clarke
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2019-10-02 19:40 UTC (permalink / raw)
  To: Alex Bennée, qemu-devel
  Cc: stefan.brankovic, qemu-ppc, Paul Clarke, Aleksandar Markovic

On 10/2/19 10:38 AM, Alex Bennée wrote:
> Is the denbcdq instruction exposed in any standard float operations?
> Once this is fixed it would be worth adding a testcase (either ppc64
> specific or multiarch) so protect it from regression in the future.

Not standard float operations -- this is binary coded decimal stuff.
It would certainly be possible to write a ppc specific test case.


r~


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re:  Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-10-02 19:40       ` Richard Henderson
@ 2019-10-02 19:55         ` Paul Clarke
  2019-10-04 19:32           ` Aleksandar Markovic
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Clarke @ 2019-10-02 19:55 UTC (permalink / raw)
  To: Richard Henderson, Alex Bennée, qemu-devel
  Cc: stefan.brankovic, qemu-ppc, Aleksandar Markovic

On 10/2/19 2:40 PM, Richard Henderson wrote:
> On 10/2/19 10:38 AM, Alex Bennée wrote:
>> Is the denbcdq instruction exposed in any standard float operations?
>> Once this is fixed it would be worth adding a testcase (either ppc64
>> specific or multiarch) so protect it from regression in the future.
> 
> Not standard float operations -- this is binary coded decimal stuff.
> It would certainly be possible to write a ppc specific test case.

In comment #9 in the bug (https://bugs.launchpad.net/qemu/+bug/1841990/comments/9), I note that the issue was produced in running the test suite for the Power Vector Library project (https://github.com/open-power-sdk/pveclib), which makes productive use of dcbenq.

Maybe that could be adopted or adapted to suit?

PC


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-10-02 14:08     ` Stefan Brankovic
@ 2019-10-03 11:11       ` Stefan Brankovic
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Brankovic @ 2019-10-03 11:11 UTC (permalink / raw)
  To: Mark Cave-Ayland, Aleksandar Markovic; +Cc: qemu-ppc, Paul Clarke, qemu-devel

Please take a look at the following patch 
https://lists.nongnu.org/archive/html/qemu-ppc/2019-10/msg00133.html and 
let me know if problem is solved.

On 2.10.19. 16:08, Stefan Brankovic wrote:
> Hi Mark,
>
> Thank you for reporting this bug. I was away from office for couple of 
> days, so that's why I am answering you a bit late, sorry about that. I 
> will start working on a solution and try to fix this problem in next 
> couple of days.
>
> On 1.10.19. 20:24, Mark Cave-Ayland wrote:
>> On 28/09/2019 18:45, Aleksandar Markovic wrote:
>>
>> Hi Aleksandar,
>>
>> Thanks for taking a look at this!
>>
>>> Mark and Paul (and Stefan),
>>>
>>> Thanks for spotting this and pinpointing the culprit commit. I guess 
>>> Stefan is going
>>> to respond soon, but, in the meantime, I took a look at the commit 
>>> in question:
>>>
>>> https://github.com/qemu/qemu/commit/4e6d0920e7547e6af4bbac5ffe9adfe6ea621822 
>>>
>>>
>>> I don't have at the moment any dev/test environment handy, but I did 
>>> manual
>>> inspection of the code, and here is what I found (in order of 
>>> importance, perceived
>>> by me):
>>>
>>> 1. The code will not work correctly if the shift ammount (variable 
>>> 'sh') is 0. This
>>> is because, in that case, one of succeeding invocations of TCG shift 
>>> functions will
>>> be required to shift a 64-bit TCG variable by 64 bits, and the 
>>> result of such TCG
>>> operation is undefined (shift amount must be 63 or less) - see
>>> https://github.com/qemu/qemu/blob/master/tcg/README.
>> Yes I think you're right here - the old helper got around this by 
>> doing an explicit
>> copy from a to r if the shift value is zero. In fact the case that 
>> Paul reported is
>> exactly this:
>>
>>     vsl VRT, VRA, VRB
>>
>> => 0x100006e0 <vec_slq+132>: vsl v0,v0,v1
>> (gdb) p $vr0.uint128
>> $21 = 0x10111213141516172021222324252650
>> (gdb) p $vr1.uint128
>> $22 = 0x0
>> (gdb) stepi
>> 0x00000000100006e4 in vec_slq ()
>> 1: x/i $pc
>> => 0x100006e4 <vec_slq+136>: xxlor vs0,vs32,vs32
>> (gdb) p $vr0.uint128
>> $23 = 0x10111213141516172021222324252650
>>
>> I guess the solution is check for sh == 0 and if this is the case, 
>> execute a copy
>> instead.
> I agree with you. This will be changed in upcoming patch.
>>
>>> 2. Variable naming is better in the old helper than in the new 
>>> translator. In that
>>> light, I would advise Stefan to change 'sh' to 'shift', and 
>>> 'shifted' to 'carry'.
>> It looks like the name "sh" comes from the ISA documentation, so 
>> whilst it's a little
>> tricky to compare with the previous implementation it does make sense 
>> when comparing
>> with the algorithm shown there. Note: this implementation also drops 
>> the check for
>> each byte of VRB having the same shift value - should we care about 
>> this?
>
> "sh" is taken from the ISA documentation, so I would leave that as it 
> is now, but I can change some other variable names to be consistent 
> with previous implementation (e.g. "shifted" -> "carry").
>
> I don't think that we should check each byte of VRB, because we care 
> only about "defined" behavior. If shift values doesn't match, result 
> is "undefined" so it doesn't matter what is inside resulting register.
>
>>> 3. Lines
>>>
>>> tcg_gen_andi_i64(shifted, shifted, 0x7fULL);
>>>
>>> and
>>>
>>> tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);
>>>
>>> appear to be spurious (albait in a harmless way). Therefore, they 
>>> should be deleted,
>>> or, alternatevely, a justification for them should be provided.
>> I'm not sure why they are needed either - there's certainly no 
>> mention of it in the
>> ISA documentation. Stefan?
> This will be removed in upcoming patch.
>>
>>> 4. In the commit message, variable names were used without quotation 
>>> mark, resulting
>>> in puzzling and unclear wording.
>>>
>>> 5. (a question for Mark) After all recent changes, does 
>>> get_avr64(..., ..., true)
>>> always (for any endian configuration) return the "high" half of an 
>>> Altivec register,
>>> and get_avr64(..., ..., false) the "low" one?
>> Yes - the new functions always return the MSB (high) and LSB (low) 
>> correctly
>> regardless of host endian.
>>
>>> Given all these circumstances, perhaps the most reasonable solution 
>>> would be to
>>> revert the commit in question, and allow Stefan enough dev and test 
>>> time to hopefully
>>> submit a new, better, version later on.
>> Given that it has been broken for 3 months now, I don't think we're 
>> in any major rush
>> to revert ASAP. I'd prefer to give Stefan a bit more time first since 
>> he does report
>> some substantial speed improvements from these new implementations.
>>
>>
>> ATB,
>>
>> Mark.
>
> Best Regards,
>
> Stefan
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Re: target/ppc: bug in optimised vsl/vsr implementation?
  2019-10-02 19:55         ` Paul Clarke
@ 2019-10-04 19:32           ` Aleksandar Markovic
  0 siblings, 0 replies; 13+ messages in thread
From: Aleksandar Markovic @ 2019-10-04 19:32 UTC (permalink / raw)
  To: Paul Clarke
  Cc: qemu-ppc, Alex Bennée, Richard Henderson, QEMU Developers,
	Stefan Brankovic

>
> In comment #9 in the bug (https://bugs.launchpad.net/qemu/+bug/1841990/comments/9), I note that the issue was produced in running the test suite for the Power Vector Library project (https://github.com/open-power-sdk/pveclib), which makes productive use of dcbenq.
>
> Maybe that could be adopted or adapted to suit?
>
> PC

I will leave adoption/adaptation decision to Alex and others, but just
want to tell you that I took a brief look at pveclib code and
internals, and I got an impression that this is an excellent peace of
software, that required incredible meticulousness in order to
implement, all kudos.

Aleksandar


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-04 19:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-26 18:04 target/ppc: bug in optimised vsl/vsr implementation? Mark Cave-Ayland
2019-09-28 17:45 ` Aleksandar Markovic
2019-09-28 22:17   ` Aleksandar Markovic
2019-09-30 14:34     ` Paul Clarke
2019-09-30 14:53       ` Aleksandar Markovic
2019-09-30 14:37   ` Aleksandar Markovic
2019-10-01 18:24   ` Mark Cave-Ayland
2019-10-02 14:08     ` Stefan Brankovic
2019-10-03 11:11       ` Stefan Brankovic
2019-10-02 17:38     ` Alex Bennée
2019-10-02 19:40       ` Richard Henderson
2019-10-02 19:55         ` Paul Clarke
2019-10-04 19:32           ` Aleksandar Markovic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).