* [PATCH 0/2] Fix some Neon insns on big-endian hosts
@ 2020-10-28 19:17 Peter Maydell
2020-10-28 19:17 ` [PATCH 1/2] target/arm: Fix float16 pairwise Neon ops " Peter Maydell
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Peter Maydell @ 2020-10-28 19:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel; +Cc: Richard Henderson
This patchseries fixes a couple of places where the vector
helpers we use for Neon insns were missing uses of the
address-swizzling macros H1(), H2() or H4(). This is harmless
on little-endian hosts but causes the wrong results to be
generated on big-endian hosts. The affected insns are
VUDOT(scalar), VSDOT(scalar), VPADD, VPMAX and VPMIN.
This series is independent of Richard's recent "target/arm:
Fix neon reg offsets"; it fixes the handful of remaining
risu test failures I see.
thanks
-- PMM
Peter Maydell (2):
target/arm: Fix float16 pairwise Neon ops on big-endian hosts
target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
target/arm/vec_helper.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
2020-10-28 19:17 [PATCH 0/2] Fix some Neon insns on big-endian hosts Peter Maydell
@ 2020-10-28 19:17 ` Peter Maydell
2020-10-28 19:45 ` Philippe Mathieu-Daudé
2020-10-28 19:17 ` [PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) " Peter Maydell
2020-10-28 19:34 ` [PATCH 0/2] Fix some Neon insns " Richard Henderson
2 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2020-10-28 19:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel; +Cc: Richard Henderson
In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data. This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target/arm/vec_helper.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index a973454e4f4..30d76d05beb 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -1858,10 +1858,10 @@ DO_ABA(gvec_uaba_d, uint64_t)
r2 = float16_##OP(m[H2(0)], m[H2(1)], fpst); \
r3 = float16_##OP(m[H2(2)], m[H2(3)], fpst); \
\
- d[H4(0)] = r0; \
- d[H4(1)] = r1; \
- d[H4(2)] = r2; \
- d[H4(3)] = r3; \
+ d[H2(0)] = r0; \
+ d[H2(1)] = r1; \
+ d[H2(2)] = r2; \
+ d[H2(3)] = r3; \
}
DO_NEON_PAIRWISE(neon_padd, add)
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
2020-10-28 19:17 [PATCH 0/2] Fix some Neon insns on big-endian hosts Peter Maydell
2020-10-28 19:17 ` [PATCH 1/2] target/arm: Fix float16 pairwise Neon ops " Peter Maydell
@ 2020-10-28 19:17 ` Peter Maydell
2020-10-28 19:51 ` Philippe Mathieu-Daudé
2020-10-28 19:34 ` [PATCH 0/2] Fix some Neon insns " Richard Henderson
2 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2020-10-28 19:17 UTC (permalink / raw)
To: qemu-arm, qemu-devel; +Cc: Richard Henderson
The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.
For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
I believe that gvec_udot_idx_h and gvec_sdot_idx_h are OK
because the index there is over groups of 4*16-bit values,
which are 64 bits each.
---
target/arm/vec_helper.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/arm/vec_helper.c b/target/arm/vec_helper.c
index 30d76d05beb..0f33127c4c4 100644
--- a/target/arm/vec_helper.c
+++ b/target/arm/vec_helper.c
@@ -293,7 +293,7 @@ void HELPER(gvec_sdot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
intptr_t index = simd_data(desc);
uint32_t *d = vd;
int8_t *n = vn;
- int8_t *m_indexed = (int8_t *)vm + index * 4;
+ int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
/* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
* Otherwise opr_sz is a multiple of 16.
@@ -324,7 +324,7 @@ void HELPER(gvec_udot_idx_b)(void *vd, void *vn, void *vm, uint32_t desc)
intptr_t index = simd_data(desc);
uint32_t *d = vd;
uint8_t *n = vn;
- uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+ uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
/* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
* Otherwise opr_sz is a multiple of 16.
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 0/2] Fix some Neon insns on big-endian hosts
2020-10-28 19:17 [PATCH 0/2] Fix some Neon insns on big-endian hosts Peter Maydell
2020-10-28 19:17 ` [PATCH 1/2] target/arm: Fix float16 pairwise Neon ops " Peter Maydell
2020-10-28 19:17 ` [PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) " Peter Maydell
@ 2020-10-28 19:34 ` Richard Henderson
2 siblings, 0 replies; 7+ messages in thread
From: Richard Henderson @ 2020-10-28 19:34 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel
On 10/28/20 12:17 PM, Peter Maydell wrote:
> Peter Maydell (2):
> target/arm: Fix float16 pairwise Neon ops on big-endian hosts
> target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
2020-10-28 19:17 ` [PATCH 1/2] target/arm: Fix float16 pairwise Neon ops " Peter Maydell
@ 2020-10-28 19:45 ` Philippe Mathieu-Daudé
2020-11-02 9:21 ` Philippe Mathieu-Daudé
0 siblings, 1 reply; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-28 19:45 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson
On 10/28/20 8:17 PM, Peter Maydell wrote:
> In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
> meant we were using the H4() address swizzler macro rather than the
> H2() which is required for 2-byte data. This had no effect on
> little-endian hosts but meant we put the result data into the
> destination Dreg in the wrong order on big-endian hosts.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> target/arm/vec_helper.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
2020-10-28 19:17 ` [PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) " Peter Maydell
@ 2020-10-28 19:51 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-10-28 19:51 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson
On 10/28/20 8:17 PM, Peter Maydell wrote:
> The helper functions for performing the udot/sdot operations against
> a scalar were not using an address-swizzling macro when converting
> the index of the scalar element into a pointer into the vm array.
> This had no effect on little-endian hosts but meant we generated
> incorrect results on big-endian hosts.
>
> For these insns, the index is indexing over group of 4 8-bit values,
> so 32 bits per indexed entity, and H4() is therefore what we want.
> (For Neon the only possible input indexes are 0 and 1.)
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> I believe that gvec_udot_idx_h and gvec_sdot_idx_h are OK
> because the index there is over groups of 4*16-bit values,
> which are 64 bits each.
> ---
> target/arm/vec_helper.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] target/arm: Fix float16 pairwise Neon ops on big-endian hosts
2020-10-28 19:45 ` Philippe Mathieu-Daudé
@ 2020-11-02 9:21 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 7+ messages in thread
From: Philippe Mathieu-Daudé @ 2020-11-02 9:21 UTC (permalink / raw)
To: Peter Maydell, qemu-arm, qemu-devel; +Cc: Richard Henderson
On 10/28/20 8:45 PM, Philippe Mathieu-Daudé wrote:
> On 10/28/20 8:17 PM, Peter Maydell wrote:
>> In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
>> meant we were using the H4() address swizzler macro rather than the
>> H2() which is required for 2-byte data. This had no effect on
>> little-endian hosts but meant we put the result data into the
>> destination Dreg in the wrong order on big-endian hosts.
>>
>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>> ---
>> target/arm/vec_helper.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-11-02 9:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-28 19:17 [PATCH 0/2] Fix some Neon insns on big-endian hosts Peter Maydell
2020-10-28 19:17 ` [PATCH 1/2] target/arm: Fix float16 pairwise Neon ops " Peter Maydell
2020-10-28 19:45 ` Philippe Mathieu-Daudé
2020-11-02 9:21 ` Philippe Mathieu-Daudé
2020-10-28 19:17 ` [PATCH 2/2] target/arm: Fix VUDOT/VSDOT (scalar) " Peter Maydell
2020-10-28 19:51 ` Philippe Mathieu-Daudé
2020-10-28 19:34 ` [PATCH 0/2] Fix some Neon insns " Richard Henderson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.