Re: target/ppc: bug in optimised vsl/vsr implementation?

From: Stefan Brankovic <stefan.brankovic@rt-rk.com>
To: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>,
	Aleksandar Markovic <aleksandar.m.mail@gmail.com>
Cc: "qemu-ppc@nongnu.org" <qemu-ppc@nongnu.org>,
	Paul Clarke <pc@us.ibm.com>, qemu-devel <qemu-devel@nongnu.org>
Subject: Re: target/ppc: bug in optimised vsl/vsr implementation?
Date: Thu, 3 Oct 2019 13:11:30 +0200	[thread overview]
Message-ID: <703facbe-a544-3185-30e7-1918af8dccdf@rt-rk.com> (raw)
In-Reply-To: <16069cfc-66c6-0629-51de-6dfe39214e11@rt-rk.com>

Please take a look at the following patch 
https://lists.nongnu.org/archive/html/qemu-ppc/2019-10/msg00133.html and 
let me know if problem is solved.

On 2.10.19. 16:08, Stefan Brankovic wrote:
> Hi Mark,
>
> Thank you for reporting this bug. I was away from office for couple of 
> days, so that's why I am answering you a bit late, sorry about that. I 
> will start working on a solution and try to fix this problem in next 
> couple of days.
>
> On 1.10.19. 20:24, Mark Cave-Ayland wrote:
>> On 28/09/2019 18:45, Aleksandar Markovic wrote:
>>
>> Hi Aleksandar,
>>
>> Thanks for taking a look at this!
>>
>>> Mark and Paul (and Stefan),
>>>
>>> Thanks for spotting this and pinpointing the culprit commit. I guess 
>>> Stefan is going
>>> to respond soon, but, in the meantime, I took a look at the commit 
>>> in question:
>>>
>>> https://github.com/qemu/qemu/commit/4e6d0920e7547e6af4bbac5ffe9adfe6ea621822 
>>>
>>>
>>> I don't have at the moment any dev/test environment handy, but I did 
>>> manual
>>> inspection of the code, and here is what I found (in order of 
>>> importance, perceived
>>> by me):
>>>
>>> 1. The code will not work correctly if the shift ammount (variable 
>>> 'sh') is 0. This
>>> is because, in that case, one of succeeding invocations of TCG shift 
>>> functions will
>>> be required to shift a 64-bit TCG variable by 64 bits, and the 
>>> result of such TCG
>>> operation is undefined (shift amount must be 63 or less) - see
>>> https://github.com/qemu/qemu/blob/master/tcg/README.
>> Yes I think you're right here - the old helper got around this by 
>> doing an explicit
>> copy from a to r if the shift value is zero. In fact the case that 
>> Paul reported is
>> exactly this:
>>
>>     vsl VRT, VRA, VRB
>>
>> => 0x100006e0 <vec_slq+132>: vsl v0,v0,v1
>> (gdb) p $vr0.uint128
>> $21 = 0x10111213141516172021222324252650
>> (gdb) p $vr1.uint128
>> $22 = 0x0
>> (gdb) stepi
>> 0x00000000100006e4 in vec_slq ()
>> 1: x/i $pc
>> => 0x100006e4 <vec_slq+136>: xxlor vs0,vs32,vs32
>> (gdb) p $vr0.uint128
>> $23 = 0x10111213141516172021222324252650
>>
>> I guess the solution is check for sh == 0 and if this is the case, 
>> execute a copy
>> instead.
> I agree with you. This will be changed in upcoming patch.
>>
>>> 2. Variable naming is better in the old helper than in the new 
>>> translator. In that
>>> light, I would advise Stefan to change 'sh' to 'shift', and 
>>> 'shifted' to 'carry'.
>> It looks like the name "sh" comes from the ISA documentation, so 
>> whilst it's a little
>> tricky to compare with the previous implementation it does make sense 
>> when comparing
>> with the algorithm shown there. Note: this implementation also drops 
>> the check for
>> each byte of VRB having the same shift value - should we care about 
>> this?
>
> "sh" is taken from the ISA documentation, so I would leave that as it 
> is now, but I can change some other variable names to be consistent 
> with previous implementation (e.g. "shifted" -> "carry").
>
> I don't think that we should check each byte of VRB, because we care 
> only about "defined" behavior. If shift values doesn't match, result 
> is "undefined" so it doesn't matter what is inside resulting register.
>
>>> 3. Lines
>>>
>>> tcg_gen_andi_i64(shifted, shifted, 0x7fULL);
>>>
>>> and
>>>
>>> tcg_gen_andi_i64(shifted, shifted, 0xfe00000000000000ULL);
>>>
>>> appear to be spurious (albait in a harmless way). Therefore, they 
>>> should be deleted,
>>> or, alternatevely, a justification for them should be provided.
>> I'm not sure why they are needed either - there's certainly no 
>> mention of it in the
>> ISA documentation. Stefan?
> This will be removed in upcoming patch.
>>
>>> 4. In the commit message, variable names were used without quotation 
>>> mark, resulting
>>> in puzzling and unclear wording.
>>>
>>> 5. (a question for Mark) After all recent changes, does 
>>> get_avr64(..., ..., true)
>>> always (for any endian configuration) return the "high" half of an 
>>> Altivec register,
>>> and get_avr64(..., ..., false) the "low" one?
>> Yes - the new functions always return the MSB (high) and LSB (low) 
>> correctly
>> regardless of host endian.
>>
>>> Given all these circumstances, perhaps the most reasonable solution 
>>> would be to
>>> revert the commit in question, and allow Stefan enough dev and test 
>>> time to hopefully
>>> submit a new, better, version later on.
>> Given that it has been broken for 3 months now, I don't think we're 
>> in any major rush
>> to revert ASAP. I'd prefer to give Stefan a bit more time first since 
>> he does report
>> some substantial speed improvements from these new implementations.
>>
>>
>> ATB,
>>
>> Mark.
>
> Best Regards,
>
> Stefan
>