On Mon, Aug 19, 2019 at 12:43:21PM -0500, Paul A. Clarke wrote:
> From: "Paul A. Clarke" <pc@us.ibm.com>
> 
> The POWER8 and POWER9 User's Manuals specify the implementation
> behavior for what the ISA leaves "undefined" behavior for the
> xscvdpspn and xscvdpsp instructions.  This patch corrects the QEMU
> implementation to match the hardware implementation for that case.
> 
> ISA 3.0B has xscvdpspn leaving its result in word 0 of the target register,
> with the other words of the target register left "undefined".
> 
> The User's Manuals specify:
>   VSX scalar convert from double-precision to single-precision (xscvdpsp,
>   xscvdpspn).
>   VSR[32:63] is set to VSR[0:31].
> So, words 0 and 1 both contain the result.
> 
> Note: this is important because GCC as of version 8 or so, assumes and takes
> advantage of this behavior to optimize the following sequence:
>   xscvdpspn vs0,vs1
>   mffprwz   r8,f0
> ISA 3.0B has xscvdpspn leaving its result in word 0 of the target register,
> and mffprwz expecting its input to come from word 1 of the source register.
> This sequence fails with QEMU, as a shift is required between those two
> instructions.  However, since the hardware splats the result to both words 0
> and 1 of its output register, the shift is not necessary.
> 
> Expect a future revision of the ISA to specify this behavior.
> 
> Signed-off-by: Paul A. Clarke <pc@us.ibm.com>

Applied to ppc-for-4.2, thanks.

> 
> v2
> - Splitting patch "ppc: Three floating point fixes"; this is just one part.
> - Updated commit message to clarify behavior is documented in User's Manuals.
> - Updated commit message to correct which words are in output and source of
>   xscvdpspn and mffprz.
> - No source changes to this part of the original patch.
> 
> ---
>  target/ppc/fpu_helper.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
> index 5611cf0..23b9c97 100644
> --- a/target/ppc/fpu_helper.c
> +++ b/target/ppc/fpu_helper.c
> @@ -2871,10 +2871,14 @@ void helper_xscvqpdp(CPUPPCState *env, uint32_t opcode,
>  
>  uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
>  {
> +    uint64_t result;
> +
>      float_status tstat = env->fp_status;
>      set_float_exception_flags(0, &tstat);
>  
> -    return (uint64_t)float64_to_float32(xb, &tstat) << 32;
> +    result = (uint64_t)float64_to_float32(xb, &tstat);
> +    /* hardware replicates result to both words of the doubleword result.  */
> +    return (result << 32) | result;
>  }
>  
>  uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson