All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] target/i386: Raise #GP on unaligned m128 accesses when required.
@ 2022-08-29 14:23 Ricky Zhou
  2022-08-29 14:23 ` [PATCH 1/1] " Ricky Zhou
  0 siblings, 1 reply; 5+ messages in thread
From: Ricky Zhou @ 2022-08-29 14:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, richard.henderson, eduardo, Ricky Zhou

This is a change to raise #GP on unaligned m128 loads/stores when
required by the spec. Some notes on this change:

1. I considered making use of the existing support for enforcing memory
   alignment (setting MO_ALIGN_16 in the load/store's MemOp), but
   rejected this approach. There are at least two scenarios where we
   might want to do alignment checks in x86:
   
   a. Loads/stores when the AC flag is enabled (which should raise #AC
      on misalignment)
   b. SSE/AVX instructions which require memory alignment (which raise
      #GP on misalignment)
   
   The MemOp alignment checking mechanism can only handle one of these
   scenarios, since they require different exceptions to be raised. I
   think it make more sense to use the existing memory alignment support
   for implementing (a), since helper_unaligned_{ld,st} is already
   triggers SIGBUS in qemu-user. This is why I ended up implementing (b)
   with a helper.

2. It is often the case that legacy SSE instructions require 16 byte
   alignment of 128-bit memory operands, but AVX versions of the
   instructions do not (e.g. movsldup requires alignment and vmovsldup
   does not). From what I can tell, QEMU currently doesn't appear to
   report AVX support in cpuid, but it still seems to emulate some of
   these instructions if you tell it to execute them. This change
   attempts to distinguish between legacy SSE instructions and AVX
   instructions by by conditioning on !(s->prefix & PREFIX_VEX). Not
   sure this is very future-proof though - for example, it may need to
   be updated if support for EVEX prefixes is added. LMK if there's a
   nicer way to do this.

3. I tested this by running a Linux VM in qemu-system-x86_64 and
   verifying that movaps on an misaligned address triggers a segfault.

Ricky Zhou (1):
  target/i386: Raise #GP on unaligned m128 accesses when required.

 target/i386/helper.h         |  1 +
 target/i386/tcg/mem_helper.c |  8 ++++++++
 target/i386/tcg/translate.c  | 38 +++++++++++++++++++++++++++++++++---
 3 files changed, 44 insertions(+), 3 deletions(-)

-- 
2.37.2



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] target/i386: Raise #GP on unaligned m128 accesses when required.
  2022-08-29 14:23 [PATCH 0/1] target/i386: Raise #GP on unaligned m128 accesses when required Ricky Zhou
@ 2022-08-29 14:23 ` Ricky Zhou
  2022-08-29 16:45   ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Ricky Zhou @ 2022-08-29 14:23 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, richard.henderson, eduardo, Ricky Zhou

Many instructions which load/store 128-bit values are supposed to
raise #GP when the memory operand isn't 16-byte aligned. This includes:
 - Instructions explicitly requiring memory alignment (Exceptions Type 1
   in the "AVX and SSE Instruction Exception Specification" section of
   the SDM)
 - Legacy SSE instructions that load/store 128-bit values (Exceptions
   Types 2 and 4).

This change adds a raise_gp_if_unaligned helper which raises #GP if an
address is not properly aligned. This helper is called before 128-bit
loads/stores where appropriate.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/217
Signed-off-by: Ricky Zhou <ricky@rzhou.org>
---
 target/i386/helper.h         |  1 +
 target/i386/tcg/mem_helper.c |  8 ++++++++
 target/i386/tcg/translate.c  | 38 +++++++++++++++++++++++++++++++++---
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index ac3b4d1ee3..17d78f2b0d 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -213,6 +213,7 @@ DEF_HELPER_1(update_mxcsr, void, env)
 DEF_HELPER_1(enter_mmx, void, env)
 DEF_HELPER_1(emms, void, env)
 DEF_HELPER_3(movq, void, env, ptr, ptr)
+DEF_HELPER_3(raise_gp_if_unaligned, void, env, tl, tl)
 
 #define SHIFT 0
 #include "ops_sse_header.h"
diff --git a/target/i386/tcg/mem_helper.c b/target/i386/tcg/mem_helper.c
index e3cdafd2d4..79259abef3 100644
--- a/target/i386/tcg/mem_helper.c
+++ b/target/i386/tcg/mem_helper.c
@@ -181,3 +181,11 @@ void helper_boundl(CPUX86State *env, target_ulong a0, int v)
         raise_exception_ra(env, EXCP05_BOUND, GETPC());
     }
 }
+
+void helper_raise_gp_if_unaligned(CPUX86State *env, target_ulong addr,
+                                  target_ulong align_mask)
+{
+    if (unlikely((addr & align_mask) != 0)) {
+        raise_exception_ra(env, EXCP0D_GPF, GETPC());
+    }
+}
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index b7972f0ff5..de13f483b6 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -3054,7 +3054,7 @@ static const struct SSEOpHelper_epp sse_op_table6[256] = {
     [0x25] = SSE41_OP(pmovsxdq),
     [0x28] = SSE41_OP(pmuldq),
     [0x29] = SSE41_OP(pcmpeqq),
-    [0x2a] = SSE41_SPECIAL, /* movntqda */
+    [0x2a] = SSE41_SPECIAL, /* movntdqa */
     [0x2b] = SSE41_OP(packusdw),
     [0x30] = SSE41_OP(pmovzxbw),
     [0x31] = SSE41_OP(pmovzxbd),
@@ -3194,10 +3194,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
             break;
         case 0x1e7: /* movntdq */
         case 0x02b: /* movntps */
-        case 0x12b: /* movntps */
+        case 0x12b: /* movntpd */
             if (mod == 3)
                 goto illegal_op;
             gen_lea_modrm(env, s, modrm);
+            gen_helper_raise_gp_if_unaligned(cpu_env, s->A0, tcg_const_tl(0xf));
             gen_sto_env_A0(s, offsetof(CPUX86State, xmm_regs[reg]));
             break;
         case 0x3f0: /* lddqu */
@@ -3273,6 +3274,11 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
         case 0x26f: /* movdqu xmm, ea */
             if (mod != 3) {
                 gen_lea_modrm(env, s, modrm);
+                /* movaps, movapd, movdqa */
+                if (b == 0x028 || b == 0x128 || b == 0x16f) {
+                    gen_helper_raise_gp_if_unaligned(cpu_env, s->A0,
+                                                     tcg_const_tl(0xf));
+                }
                 gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg]));
             } else {
                 rm = (modrm & 7) | REX_B(s);
@@ -3331,6 +3337,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
         case 0x212: /* movsldup */
             if (mod != 3) {
                 gen_lea_modrm(env, s, modrm);
+                if (!(s->prefix & PREFIX_VEX)) {
+                    gen_helper_raise_gp_if_unaligned(cpu_env, s->A0,
+                                                     tcg_const_tl(0xf));
+                }
                 gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg]));
             } else {
                 rm = (modrm & 7) | REX_B(s);
@@ -3373,6 +3383,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
         case 0x216: /* movshdup */
             if (mod != 3) {
                 gen_lea_modrm(env, s, modrm);
+                if (!(s->prefix & PREFIX_VEX)) {
+                    gen_helper_raise_gp_if_unaligned(cpu_env, s->A0,
+                                                     tcg_const_tl(0xf));
+                }
                 gen_ldo_env_A0(s, offsetof(CPUX86State, xmm_regs[reg]));
             } else {
                 rm = (modrm & 7) | REX_B(s);
@@ -3465,6 +3479,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
         case 0x27f: /* movdqu ea, xmm */
             if (mod != 3) {
                 gen_lea_modrm(env, s, modrm);
+                if (b == 0x029 || b == 0x129 || b == 0x17f) {
+                    gen_helper_raise_gp_if_unaligned(cpu_env, s->A0,
+                                                     tcg_const_tl(0xf));
+                }
                 gen_sto_env_A0(s, offsetof(CPUX86State, xmm_regs[reg]));
             } else {
                 rm = (modrm & 7) | REX_B(s);
@@ -3806,10 +3824,16 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                         tcg_gen_st16_tl(s->tmp0, cpu_env, op2_offset +
                                         offsetof(ZMMReg, ZMM_W(0)));
                         break;
-                    case 0x2a:            /* movntqda */
+                    case 0x2a:            /* movntdqa */
+                        gen_helper_raise_gp_if_unaligned(cpu_env, s->A0,
+                                                         tcg_const_tl(0xf));
                         gen_ldo_env_A0(s, op1_offset);
                         return;
                     default:
+                        if (!(s->prefix & PREFIX_VEX)) {
+                            gen_helper_raise_gp_if_unaligned(cpu_env, s->A0,
+                                                             tcg_const_tl(0xf));
+                        }
                         gen_ldo_env_A0(s, op2_offset);
                     }
                 }
@@ -4351,6 +4375,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                 } else {
                     op2_offset = offsetof(CPUX86State,xmm_t0);
                     gen_lea_modrm(env, s, modrm);
+                    if (!(s->prefix & PREFIX_VEX)) {
+                        gen_helper_raise_gp_if_unaligned(cpu_env, s->A0,
+                                                         tcg_const_tl(0xf));
+                    }
                     gen_ldo_env_A0(s, op2_offset);
                 }
             } else {
@@ -4469,6 +4497,10 @@ static void gen_sse(CPUX86State *env, DisasContext *s, int b,
                     break;
                 default:
                     /* 128 bit access */
+                    if (!(s->prefix & PREFIX_VEX)) {
+                        gen_helper_raise_gp_if_unaligned(cpu_env, s->A0,
+                                                         tcg_const_tl(0xf));
+                    }
                     gen_ldo_env_A0(s, op2_offset);
                     break;
                 }
-- 
2.37.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] target/i386: Raise #GP on unaligned m128 accesses when required.
  2022-08-29 14:23 ` [PATCH 1/1] " Ricky Zhou
@ 2022-08-29 16:45   ` Richard Henderson
  2022-08-29 20:46     ` Ricky Zhou
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2022-08-29 16:45 UTC (permalink / raw)
  To: Ricky Zhou, qemu-devel; +Cc: pbonzini, eduardo

On 8/29/22 07:23, Ricky Zhou wrote:
> Many instructions which load/store 128-bit values are supposed to
> raise #GP when the memory operand isn't 16-byte aligned. This includes:
>   - Instructions explicitly requiring memory alignment (Exceptions Type 1
>     in the "AVX and SSE Instruction Exception Specification" section of
>     the SDM)
>   - Legacy SSE instructions that load/store 128-bit values (Exceptions
>     Types 2 and 4).
> 
> This change adds a raise_gp_if_unaligned helper which raises #GP if an
> address is not properly aligned. This helper is called before 128-bit
> loads/stores where appropriate.
> 
> Resolves:https://gitlab.com/qemu-project/qemu/-/issues/217
> Signed-off-by: Ricky Zhou<ricky@rzhou.org>
> ---
>   target/i386/helper.h         |  1 +
>   target/i386/tcg/mem_helper.c |  8 ++++++++
>   target/i386/tcg/translate.c  | 38 +++++++++++++++++++++++++++++++++---
>   3 files changed, 44 insertions(+), 3 deletions(-)


This trap should be raised via the memory operation:

- static inline void gen_ldo_env_A0(DisasContext *s, int offset)

+ static inline void gen_ldo_env_A0(DisasContext *s, int offset, bool aligned)
   {

       int mem_index = s->mem_index;

-     tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, mem_index, MO_LEUQ);

+     tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, mem_index,
+                         MO_LEUQ | (aligned ? MO_ALIGN_16 : 0));
       tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(0)));

       tcg_gen_addi_tl(s->tmp0, s->A0, 8);

       tcg_gen_qemu_ld_i64(s->tmp1_i64, s->tmp0, mem_index, MO_LEUQ);

       tcg_gen_st_i64(s->tmp1_i64, cpu_env, offset + offsetof(ZMMReg, ZMM_Q(1)));

   }


Only the first of the two loads/stores must be aligned, as the other is known to be +8. 
You then must fill in the x86_tcg_ops.do_unaligned_access hook to raise #GP.


r~


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] target/i386: Raise #GP on unaligned m128 accesses when required.
  2022-08-29 16:45   ` Richard Henderson
@ 2022-08-29 20:46     ` Ricky Zhou
  2022-08-29 22:54       ` Richard Henderson
  0 siblings, 1 reply; 5+ messages in thread
From: Ricky Zhou @ 2022-08-29 20:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, pbonzini, eduardo

On Mon, Aug 29, 2022 at 9:45 AM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 8/29/22 07:23, Ricky Zhou wrote:
> This trap should be raised via the memory operation:
> ...
> Only the first of the two loads/stores must be aligned, as the other is known to be +8.
> You then must fill in the x86_tcg_ops.do_unaligned_access hook to raise #GP.
Thanks for taking a look at this - did you see the bit in the cover
letter where I discuss doing this via alignment requirements on the
memory operation? My logic was that the memop alignment checks seem to
be more oriented towards triggering #AC exceptions (even though this is
not currently implemented), since qemu-user's unaligned access handlers
(helper_unaligned_{ld,st}) already trigger SIGBUS as opposed to SIGSEGV.
I was concerned that implementing this via MO_ALIGN_16 would get in the
way of a hypothetical future implementation of the AC flag, since
do_unaligned_access would need to raise #AC instead of #GP for that.

One slightly more involved way to use alignment on the MemOp could be to
arrange to pass the problematic MemOp to do_unaligned_access and
helper_unaligned_{ld,st}. Then we could allow CPUs to handle
misalignment of different MemOps differently (e.g. raise #GP/SIGSEGV for
certain ops and #AC/SIGBUS for others). For this change to x86, we could
maybe get away with making MO_ALIGN_16 and above trigger #GP/SIGSEGV and
everything else trigger #AC/SIGBUS. If that's a little hacky, we could
instead add some dedicated bits to MemOp that distinguish different
types of unaligned accesses.

What do you think? Happy to implement whichever approach is preferred!

Thanks,
Ricky


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] target/i386: Raise #GP on unaligned m128 accesses when required.
  2022-08-29 20:46     ` Ricky Zhou
@ 2022-08-29 22:54       ` Richard Henderson
  0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2022-08-29 22:54 UTC (permalink / raw)
  To: Ricky Zhou; +Cc: qemu-devel, pbonzini, eduardo

On 8/29/22 13:46, Ricky Zhou wrote:
> Thanks for taking a look at this - did you see the bit in the cover
> letter where I discuss doing this via alignment requirements on the
> memory operation? My logic was that the memop alignment checks seem to
> be more oriented towards triggering #AC exceptions (even though this is
> not currently implemented),

I missed that in the cover.  However... implementing #AC is pretty hypothetical.  It's not 
something that I've ever seen used, and not something that anyone has asked for.

> One slightly more involved way to use alignment on the MemOp could be to
> arrange to pass the problematic MemOp to do_unaligned_access and
> helper_unaligned_{ld,st}. Then we could allow CPUs to handle
> misalignment of different MemOps differently (e.g. raise #GP/SIGSEGV for
> certain ops and #AC/SIGBUS for others). For this change to x86, we could
> maybe get away with making MO_ALIGN_16 and above trigger #GP/SIGSEGV and
> everything else trigger #AC/SIGBUS. If that's a little hacky, we could
> instead add some dedicated bits to MemOp that distinguish different
> types of unaligned accesses.

There's another related problem that actually has gotten a bug report in the past: when 
the form of the address should raise #SS instead of #GP in system mode.

My initial thought was to record information about "the" memory access in the per-insn 
unwind info, until I realized that there are insns with  multiple memory operations 
requiring different treatment.  E.g. "push (%rax)", where the read might raise #GP and the 
write might raise #SS.  So I think we'd need to encode #GP vs #SS into the mmu_idx used 
(e.g. in the lsb).

However, I don't think there are any similar situations of multiple memory types affecting 
SSE, so #AC vs #GP could in fact be encoded into the per-insn unwind info.

As for SIGBUS vs SIGSEGV for SSE and user-only, you only need implement the 
x86_cpu_ops.record_sigbus hook.  C.f. the s390x version which raises PGM_SPECIFICATION -> 
SIGILL for unaligned atomic operations.


r~


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-29 22:55 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-29 14:23 [PATCH 0/1] target/i386: Raise #GP on unaligned m128 accesses when required Ricky Zhou
2022-08-29 14:23 ` [PATCH 1/1] " Ricky Zhou
2022-08-29 16:45   ` Richard Henderson
2022-08-29 20:46     ` Ricky Zhou
2022-08-29 22:54       ` Richard Henderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.