All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] x86emul: FPU handling improvements
@ 2016-12-08 11:26 Jan Beulich
  2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:26 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

1: extend / amend supported FPU opcodes
2: simplify FPU source operand handling
3: simplify FPU destination operand handling
4: reduce FPU handling code size
5: avoid undefined behavior when dealing with 10-byte FPU operands
6: simplify FPU handling asm() constraints

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Only patch 1 really changed (see there), some others just
    needed re-basing.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes
  2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
@ 2016-12-08 11:35 ` Jan Beulich
  2016-12-08 11:42   ` Andrew Cooper
  2016-12-08 11:36 ` [PATCH v2 2/6] x86emul: simplify FPU source operand handling Jan Beulich
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:35 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 9106 bytes --]

First of all there are a number of secondary encodings both Intel and
AMD support, but which aren't formally documented. See e.g.
www.sandpile.org/x86/opc_fpu.htm for inofficial documentation.

Next there are a few more no-ops - instructions which served a purpose
only on 8087 or 287.

Further switch from fail_if() to raising of #UD in a couple of places
(as the decoding of FPU opcodes should now be complete except where
explicitly marked as todo).

Also adjust a few comments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Correct placement of ffreep case. Use generate_exception(...) in
    favor of generate_exception_if(true, ...). Add sandpile.org
    reference.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3577,18 +3577,18 @@ x86_emulate(
         host_and_vcpu_must_have(fpu);
         switch ( modrm )
         {
-        case 0xc0 ... 0xc7: /* fadd %stN,%stN */
-        case 0xc8 ... 0xcf: /* fmul %stN,%stN */
-        case 0xd0 ... 0xd7: /* fcom %stN,%stN */
-        case 0xd8 ... 0xdf: /* fcomp %stN,%stN */
-        case 0xe0 ... 0xe7: /* fsub %stN,%stN */
-        case 0xe8 ... 0xef: /* fsubr %stN,%stN */
-        case 0xf0 ... 0xf7: /* fdiv %stN,%stN */
-        case 0xf8 ... 0xff: /* fdivr %stN,%stN */
+        case 0xc0 ... 0xc7: /* fadd %stN,%st */
+        case 0xc8 ... 0xcf: /* fmul %stN,%st */
+        case 0xd0 ... 0xd7: /* fcom %stN,%st */
+        case 0xd8 ... 0xdf: /* fcomp %stN,%st */
+        case 0xe0 ... 0xe7: /* fsub %stN,%st */
+        case 0xe8 ... 0xef: /* fsubr %stN,%st */
+        case 0xf0 ... 0xf7: /* fdiv %stN,%st */
+        case 0xf8 ... 0xff: /* fdivr %stN,%st */
             emulate_fpu_insn_stub(0xd8, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            ASSERT(ea.type == OP_MEM);
             ea.bytes = 4;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3634,6 +3634,7 @@ x86_emulate(
         case 0xc0 ... 0xc7: /* fld %stN */
         case 0xc8 ... 0xcf: /* fxch %stN */
         case 0xd0: /* fnop */
+        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
         case 0xe0: /* fchs */
         case 0xe1: /* fabs */
         case 0xe4: /* ftst */
@@ -3663,7 +3664,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xd9, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
@@ -3686,7 +3687,8 @@ x86_emulate(
                 dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstps", dst.val);
                 break;
-                /* case 4: fldenv - TODO */
+            case 4: /* fldenv - TODO */
+                goto cannot_emulate;
             case 5: /* fldcw m2byte */
                 ea.bytes = 2;
                 src = ea;
@@ -3695,7 +3697,8 @@ x86_emulate(
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
                 break;
-                /* case 6: fstenv - TODO */
+            case 6: /* fnstenv - TODO */
+                goto cannot_emulate;
             case 7: /* fnstcw m2byte */
                 ea.bytes = 2;
                 dst = ea;
@@ -3703,7 +3706,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fnstcw", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception(EXC_UD);
             }
         }
         break;
@@ -3723,7 +3726,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xda, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             ea.bytes = 4;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3772,16 +3775,16 @@ x86_emulate(
             vcpu_must_have_cmov();
             emulate_fpu_insn_stub_eflags(0xdb, modrm);
             break;
+        case 0xe0: /* fneni - 8087 only, ignored by 287 */
+        case 0xe1: /* fndisi - 8087 only, ignored by 287 */
         case 0xe2: /* fnclex */
-            emulate_fpu_insn("fnclex");
-            break;
         case 0xe3: /* fninit */
-            emulate_fpu_insn("fninit");
-            break;
-        case 0xe4: /* fsetpm - 287 only, ignored by 387 */
+        case 0xe4: /* fnsetpm - 287 only, ignored by 387 */
+        /* case 0xe5: frstpm - 287 only, #UD on 387 */
+            emulate_fpu_insn_stub(0xdb, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
@@ -3826,7 +3829,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fstpt", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception(EXC_UD);
             }
         }
         break;
@@ -3835,16 +3838,18 @@ x86_emulate(
         host_and_vcpu_must_have(fpu);
         switch ( modrm )
         {
-        case 0xc0 ... 0xc7: /* fadd %stN */
-        case 0xc8 ... 0xcf: /* fmul %stN */
-        case 0xe0 ... 0xe7: /* fsubr %stN */
-        case 0xe8 ... 0xef: /* fsub %stN */
-        case 0xf0 ... 0xf7: /* fdivr %stN */
-        case 0xf8 ... 0xff: /* fdiv %stN */
+        case 0xc0 ... 0xc7: /* fadd %st,%stN */
+        case 0xc8 ... 0xcf: /* fmul %st,%stN */
+        case 0xd0 ... 0xd7: /* fcom %stN,%st (alternative encoding) */
+        case 0xd8 ... 0xdf: /* fcomp %stN,%st (alternative encoding) */
+        case 0xe0 ... 0xe7: /* fsubr %st,%stN */
+        case 0xe8 ... 0xef: /* fsub %st,%stN */
+        case 0xf0 ... 0xf7: /* fdivr %st,%stN */
+        case 0xf8 ... 0xff: /* fdiv %st,%stN */
             emulate_fpu_insn_stub(0xdc, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            ASSERT(ea.type == OP_MEM);
             ea.bytes = 8;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3885,6 +3890,7 @@ x86_emulate(
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* ffree %stN */
+        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
         case 0xd0 ... 0xd7: /* fst %stN */
         case 0xd8 ... 0xdf: /* fstp %stN */
         case 0xe0 ... 0xe7: /* fucom %stN */
@@ -3892,7 +3898,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xdd, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
@@ -3922,6 +3928,9 @@ x86_emulate(
                 dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstpl", dst.val);
                 break;
+            case 4: /* frstor - TODO */
+            case 6: /* fnsave - TODO */
+                goto cannot_emulate;
             case 7: /* fnstsw m2byte */
                 ea.bytes = 2;
                 dst = ea;
@@ -3929,7 +3938,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fnstsw", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception(EXC_UD);
             }
         }
         break;
@@ -3940,6 +3949,7 @@ x86_emulate(
         {
         case 0xc0 ... 0xc7: /* faddp %stN */
         case 0xc8 ... 0xcf: /* fmulp %stN */
+        case 0xd0 ... 0xd7: /* fcomp %stN (alternative encoding) */
         case 0xd9: /* fcompp */
         case 0xe0 ... 0xe7: /* fsubrp %stN */
         case 0xe8 ... 0xef: /* fsubp %stN */
@@ -3948,7 +3958,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xde, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             ea.bytes = 2;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -4000,8 +4010,14 @@ x86_emulate(
             vcpu_must_have_cmov();
             emulate_fpu_insn_stub_eflags(0xdf, modrm);
             break;
+        case 0xc0 ... 0xc7: /* ffreep %stN */
+        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
+        case 0xd0 ... 0xd7: /* fstp %stN (alternative encoding) */
+        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
+            emulate_fpu_insn_stub(0xdf, modrm);
+            break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */



[-- Attachment #2: x86emul-FPU-opcodes.patch --]
[-- Type: text/plain, Size: 9151 bytes --]

x86emul: extend / amend supported FPU opcodes

First of all there are a number of secondary encodings both Intel and
AMD support, but which aren't formally documented. See e.g.
www.sandpile.org/x86/opc_fpu.htm for inofficial documentation.

Next there are a few more no-ops - instructions which served a purpose
only on 8087 or 287.

Further switch from fail_if() to raising of #UD in a couple of places
(as the decoding of FPU opcodes should now be complete except where
explicitly marked as todo).

Also adjust a few comments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Correct placement of ffreep case. Use generate_exception(...) in
    favor of generate_exception_if(true, ...). Add sandpile.org
    reference.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3577,18 +3577,18 @@ x86_emulate(
         host_and_vcpu_must_have(fpu);
         switch ( modrm )
         {
-        case 0xc0 ... 0xc7: /* fadd %stN,%stN */
-        case 0xc8 ... 0xcf: /* fmul %stN,%stN */
-        case 0xd0 ... 0xd7: /* fcom %stN,%stN */
-        case 0xd8 ... 0xdf: /* fcomp %stN,%stN */
-        case 0xe0 ... 0xe7: /* fsub %stN,%stN */
-        case 0xe8 ... 0xef: /* fsubr %stN,%stN */
-        case 0xf0 ... 0xf7: /* fdiv %stN,%stN */
-        case 0xf8 ... 0xff: /* fdivr %stN,%stN */
+        case 0xc0 ... 0xc7: /* fadd %stN,%st */
+        case 0xc8 ... 0xcf: /* fmul %stN,%st */
+        case 0xd0 ... 0xd7: /* fcom %stN,%st */
+        case 0xd8 ... 0xdf: /* fcomp %stN,%st */
+        case 0xe0 ... 0xe7: /* fsub %stN,%st */
+        case 0xe8 ... 0xef: /* fsubr %stN,%st */
+        case 0xf0 ... 0xf7: /* fdiv %stN,%st */
+        case 0xf8 ... 0xff: /* fdivr %stN,%st */
             emulate_fpu_insn_stub(0xd8, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            ASSERT(ea.type == OP_MEM);
             ea.bytes = 4;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3634,6 +3634,7 @@ x86_emulate(
         case 0xc0 ... 0xc7: /* fld %stN */
         case 0xc8 ... 0xcf: /* fxch %stN */
         case 0xd0: /* fnop */
+        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
         case 0xe0: /* fchs */
         case 0xe1: /* fabs */
         case 0xe4: /* ftst */
@@ -3663,7 +3664,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xd9, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
@@ -3686,7 +3687,8 @@ x86_emulate(
                 dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstps", dst.val);
                 break;
-                /* case 4: fldenv - TODO */
+            case 4: /* fldenv - TODO */
+                goto cannot_emulate;
             case 5: /* fldcw m2byte */
                 ea.bytes = 2;
                 src = ea;
@@ -3695,7 +3697,8 @@ x86_emulate(
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
                 break;
-                /* case 6: fstenv - TODO */
+            case 6: /* fnstenv - TODO */
+                goto cannot_emulate;
             case 7: /* fnstcw m2byte */
                 ea.bytes = 2;
                 dst = ea;
@@ -3703,7 +3706,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fnstcw", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception(EXC_UD);
             }
         }
         break;
@@ -3723,7 +3726,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xda, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             ea.bytes = 4;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3772,16 +3775,16 @@ x86_emulate(
             vcpu_must_have_cmov();
             emulate_fpu_insn_stub_eflags(0xdb, modrm);
             break;
+        case 0xe0: /* fneni - 8087 only, ignored by 287 */
+        case 0xe1: /* fndisi - 8087 only, ignored by 287 */
         case 0xe2: /* fnclex */
-            emulate_fpu_insn("fnclex");
-            break;
         case 0xe3: /* fninit */
-            emulate_fpu_insn("fninit");
-            break;
-        case 0xe4: /* fsetpm - 287 only, ignored by 387 */
+        case 0xe4: /* fnsetpm - 287 only, ignored by 387 */
+        /* case 0xe5: frstpm - 287 only, #UD on 387 */
+            emulate_fpu_insn_stub(0xdb, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
@@ -3826,7 +3829,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fstpt", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception(EXC_UD);
             }
         }
         break;
@@ -3835,16 +3838,18 @@ x86_emulate(
         host_and_vcpu_must_have(fpu);
         switch ( modrm )
         {
-        case 0xc0 ... 0xc7: /* fadd %stN */
-        case 0xc8 ... 0xcf: /* fmul %stN */
-        case 0xe0 ... 0xe7: /* fsubr %stN */
-        case 0xe8 ... 0xef: /* fsub %stN */
-        case 0xf0 ... 0xf7: /* fdivr %stN */
-        case 0xf8 ... 0xff: /* fdiv %stN */
+        case 0xc0 ... 0xc7: /* fadd %st,%stN */
+        case 0xc8 ... 0xcf: /* fmul %st,%stN */
+        case 0xd0 ... 0xd7: /* fcom %stN,%st (alternative encoding) */
+        case 0xd8 ... 0xdf: /* fcomp %stN,%st (alternative encoding) */
+        case 0xe0 ... 0xe7: /* fsubr %st,%stN */
+        case 0xe8 ... 0xef: /* fsub %st,%stN */
+        case 0xf0 ... 0xf7: /* fdivr %st,%stN */
+        case 0xf8 ... 0xff: /* fdiv %st,%stN */
             emulate_fpu_insn_stub(0xdc, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            ASSERT(ea.type == OP_MEM);
             ea.bytes = 8;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3885,6 +3890,7 @@ x86_emulate(
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* ffree %stN */
+        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
         case 0xd0 ... 0xd7: /* fst %stN */
         case 0xd8 ... 0xdf: /* fstp %stN */
         case 0xe0 ... 0xe7: /* fucom %stN */
@@ -3892,7 +3898,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xdd, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
@@ -3922,6 +3928,9 @@ x86_emulate(
                 dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstpl", dst.val);
                 break;
+            case 4: /* frstor - TODO */
+            case 6: /* fnsave - TODO */
+                goto cannot_emulate;
             case 7: /* fnstsw m2byte */
                 ea.bytes = 2;
                 dst = ea;
@@ -3929,7 +3938,7 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fnstsw", dst.val);
                 break;
             default:
-                goto cannot_emulate;
+                generate_exception(EXC_UD);
             }
         }
         break;
@@ -3940,6 +3949,7 @@ x86_emulate(
         {
         case 0xc0 ... 0xc7: /* faddp %stN */
         case 0xc8 ... 0xcf: /* fmulp %stN */
+        case 0xd0 ... 0xd7: /* fcomp %stN (alternative encoding) */
         case 0xd9: /* fcompp */
         case 0xe0 ... 0xe7: /* fsubrp %stN */
         case 0xe8 ... 0xef: /* fsubp %stN */
@@ -3948,7 +3958,7 @@ x86_emulate(
             emulate_fpu_insn_stub(0xde, modrm);
             break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             ea.bytes = 2;
             src = ea;
             if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -4000,8 +4010,14 @@ x86_emulate(
             vcpu_must_have_cmov();
             emulate_fpu_insn_stub_eflags(0xdf, modrm);
             break;
+        case 0xc0 ... 0xc7: /* ffreep %stN */
+        case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
+        case 0xd0 ... 0xd7: /* fstp %stN (alternative encoding) */
+        case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
+            emulate_fpu_insn_stub(0xdf, modrm);
+            break;
         default:
-            fail_if(modrm >= 0xc0);
+            generate_exception_if(ea.type != OP_MEM, EXC_UD);
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 2/6] x86emul: simplify FPU source operand handling
  2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
  2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
@ 2016-12-08 11:36 ` Jan Beulich
  2016-12-08 11:36 ` [PATCH v2 3/6] x86emul: simplify FPU destination " Jan Beulich
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 8033 bytes --]

Consistently use ea instead of src for passing the memory address to
->read(). This eliminates the need to copy ea to src, resulting in a
couple of hundred bytes smaller binary size.

In addition for opcode DE we can leverage SrcMem16 to eliminate a call
of the ->read() hook. At the same time drop the stray Mov attributes
from D8, DA, DC, and DE: They're meaningful for memory writes only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -159,10 +159,10 @@ static const opcode_desc_t opcode_table[
     ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte, ImplicitOps, ImplicitOps,
     /* 0xD8 - 0xDF */
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    DstImplicit|SrcMem16|ModRM, ImplicitOps|ModRM|Mov,
     /* 0xE0 - 0xE7 */
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
@@ -3589,10 +3589,8 @@ x86_emulate(
             break;
         default:
             ASSERT(ea.type == OP_MEM);
-            ea.bytes = 4;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 4, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3668,10 +3666,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
-                ea.bytes = 4;
-                src = ea;
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                                     4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("flds", src.val);
                 break;
@@ -3690,10 +3686,8 @@ x86_emulate(
             case 4: /* fldenv - TODO */
                 goto cannot_emulate;
             case 5: /* fldcw m2byte */
-                ea.bytes = 2;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
                 break;
@@ -3727,10 +3721,8 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
-            ea.bytes = 4;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 4, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3788,10 +3780,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
-                ea.bytes = 4;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildl", src.val);
                 break;
@@ -3815,10 +3805,8 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fistpl", dst.val);
                 break;
             case 5: /* fld m80fp */
-                ea.bytes = 10;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off,
-                                     &src.val, src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldt", src.val);
                 break;
@@ -3850,10 +3838,8 @@ x86_emulate(
             break;
         default:
             ASSERT(ea.type == OP_MEM);
-            ea.bytes = 8;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 8, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3902,10 +3888,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
-                ea.bytes = 8;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldl", src.val);
                 break;
@@ -3959,11 +3943,6 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
-            ea.bytes = 2;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
-                goto done;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fiadd m16i */
@@ -4021,10 +4000,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */
-                ea.bytes = 2;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("filds", src.val);
                 break;
@@ -4048,18 +4025,14 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fistps", dst.val);
                 break;
             case 4: /* fbld m80dec */
-                ea.bytes = 10;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off,
-                                     &src.val, src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fbld", src.val);
                 break;
             case 5: /* fild m64i */
-                ea.bytes = 8;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildll", src.val);
                 break;



[-- Attachment #2: x86emul-FPU-simplify-src.patch --]
[-- Type: text/plain, Size: 8078 bytes --]

x86emul: simplify FPU source operand handling

Consistently use ea instead of src for passing the memory address to
->read(). This eliminates the need to copy ea to src, resulting in a
couple of hundred bytes smaller binary size.

In addition for opcode DE we can leverage SrcMem16 to eliminate a call
of the ->read() hook. At the same time drop the stray Mov attributes
from D8, DA, DC, and DE: They're meaningful for memory writes only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -159,10 +159,10 @@ static const opcode_desc_t opcode_table[
     ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte, ImplicitOps, ImplicitOps,
     /* 0xD8 - 0xDF */
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
-    ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+    DstImplicit|SrcMem16|ModRM, ImplicitOps|ModRM|Mov,
     /* 0xE0 - 0xE7 */
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
     DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
@@ -3589,10 +3589,8 @@ x86_emulate(
             break;
         default:
             ASSERT(ea.type == OP_MEM);
-            ea.bytes = 4;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 4, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3668,10 +3666,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
-                ea.bytes = 4;
-                src = ea;
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                                     4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("flds", src.val);
                 break;
@@ -3690,10 +3686,8 @@ x86_emulate(
             case 4: /* fldenv - TODO */
                 goto cannot_emulate;
             case 5: /* fldcw m2byte */
-                ea.bytes = 2;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
                 break;
@@ -3727,10 +3721,8 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
-            ea.bytes = 4;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 4, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3788,10 +3780,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
-                ea.bytes = 4;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildl", src.val);
                 break;
@@ -3815,10 +3805,8 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fistpl", dst.val);
                 break;
             case 5: /* fld m80fp */
-                ea.bytes = 10;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off,
-                                     &src.val, src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldt", src.val);
                 break;
@@ -3850,10 +3838,8 @@ x86_emulate(
             break;
         default:
             ASSERT(ea.type == OP_MEM);
-            ea.bytes = 8;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
+            if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                 8, ctxt)) != X86EMUL_OKAY )
                 goto done;
             switch ( modrm_reg & 7 )
             {
@@ -3902,10 +3888,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
-                ea.bytes = 8;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldl", src.val);
                 break;
@@ -3959,11 +3943,6 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
-            ea.bytes = 2;
-            src = ea;
-            if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                 src.bytes, ctxt)) != 0 )
-                goto done;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fiadd m16i */
@@ -4021,10 +4000,8 @@ x86_emulate(
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */
-                ea.bytes = 2;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("filds", src.val);
                 break;
@@ -4048,18 +4025,14 @@ x86_emulate(
                 emulate_fpu_insn_memdst("fistps", dst.val);
                 break;
             case 4: /* fbld m80dec */
-                ea.bytes = 10;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off,
-                                     &src.val, src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fbld", src.val);
                 break;
             case 5: /* fild m64i */
-                ea.bytes = 8;
-                src = ea;
-                if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
-                                     src.bytes, ctxt)) != 0 )
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                                     8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildll", src.val);
                 break;

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 3/6] x86emul: simplify FPU destination operand handling
  2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
  2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
  2016-12-08 11:36 ` [PATCH v2 2/6] x86emul: simplify FPU source operand handling Jan Beulich
@ 2016-12-08 11:36 ` Jan Beulich
  2016-12-08 11:37 ` [PATCH v2 4/6] x86emul: reduce FPU handling code size Jan Beulich
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:36 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 8558 bytes --]

Consolidate the copying of ea to dst: There's no need to set the type
to OP_MEM, and instead the load cases setting it to OP_NONE allows the
copying to be done just once per major opcode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3663,6 +3663,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
@@ -3670,18 +3671,15 @@ x86_emulate(
                                      4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("flds", src.val);
+                dst.type = OP_NONE;
                 break;
             case 2: /* fstp m32fp */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fsts", dst.val);
+                dst.bytes = 4;
                 break;
             case 3: /* fstp m32fp */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstps", dst.val);
+                dst.bytes = 4;
                 break;
             case 4: /* fldenv - TODO */
                 goto cannot_emulate;
@@ -3690,14 +3688,13 @@ x86_emulate(
                                      2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
+                dst.type = OP_NONE;
                 break;
             case 6: /* fnstenv - TODO */
                 goto cannot_emulate;
             case 7: /* fnstcw m2byte */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fnstcw", dst.val);
+                dst.bytes = 2;
                 break;
             default:
                 generate_exception(EXC_UD);
@@ -3777,6 +3774,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
@@ -3784,37 +3782,31 @@ x86_emulate(
                                      4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildl", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m32i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttpl", dst.val);
+                dst.bytes = 4;
                 break;
             case 2: /* fist m32i */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistl", dst.val);
+                dst.bytes = 4;
                 break;
             case 3: /* fistp m32i */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistpl", dst.val);
+                dst.bytes = 4;
                 break;
             case 5: /* fld m80fp */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldt", src.val);
+                dst.type = OP_NONE;
                 break;
             case 7: /* fstp m80fp */
-                ea.bytes = 10;
-                dst.type = OP_MEM;
-                dst = ea;
                 emulate_fpu_insn_memdst("fstpt", dst.val);
+                dst.bytes = 10;
                 break;
             default:
                 generate_exception(EXC_UD);
@@ -3885,6 +3877,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
@@ -3892,34 +3885,27 @@ x86_emulate(
                                      8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldl", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m64i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttpll", dst.val);
+                dst.bytes = 8;
                 break;
             case 2: /* fst m64fp */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstl", dst.val);
+                dst.bytes = 8;
                 break;
             case 3: /* fstp m64fp */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstpl", dst.val);
+                dst.bytes = 8;
                 break;
             case 4: /* frstor - TODO */
             case 6: /* fnsave - TODO */
                 goto cannot_emulate;
             case 7: /* fnstsw m2byte */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fnstsw", dst.val);
+                dst.bytes = 2;
                 break;
             default:
                 generate_exception(EXC_UD);
@@ -3997,6 +3983,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */
@@ -4004,49 +3991,42 @@ x86_emulate(
                                      2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("filds", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m16i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttps", dst.val);
+                dst.bytes = 2;
                 break;
             case 2: /* fist m16i */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fists", dst.val);
+                dst.bytes = 2;
                 break;
             case 3: /* fistp m16i */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistps", dst.val);
+                dst.bytes = 2;
                 break;
             case 4: /* fbld m80dec */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fbld", src.val);
+                dst.type = OP_NONE;
                 break;
             case 5: /* fild m64i */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildll", src.val);
+                dst.type = OP_NONE;
                 break;
             case 6: /* fbstp packed bcd */
-                ea.bytes = 10;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fbstp", dst.val);
+                dst.bytes = 10;
                 break;
             case 7: /* fistp m64i */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistpll", dst.val);
+                dst.bytes = 8;
                 break;
             }
         }



[-- Attachment #2: x86emul-FPU-simplify-dst.patch --]
[-- Type: text/plain, Size: 8608 bytes --]

x86emul: simplify FPU destination operand handling

Consolidate the copying of ea to dst: There's no need to set the type
to OP_MEM, and instead the load cases setting it to OP_NONE allows the
copying to be done just once per major opcode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3663,6 +3663,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m32fp */
@@ -3670,18 +3671,15 @@ x86_emulate(
                                      4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("flds", src.val);
+                dst.type = OP_NONE;
                 break;
             case 2: /* fstp m32fp */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fsts", dst.val);
+                dst.bytes = 4;
                 break;
             case 3: /* fstp m32fp */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstps", dst.val);
+                dst.bytes = 4;
                 break;
             case 4: /* fldenv - TODO */
                 goto cannot_emulate;
@@ -3690,14 +3688,13 @@ x86_emulate(
                                      2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldcw", src.val);
+                dst.type = OP_NONE;
                 break;
             case 6: /* fnstenv - TODO */
                 goto cannot_emulate;
             case 7: /* fnstcw m2byte */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fnstcw", dst.val);
+                dst.bytes = 2;
                 break;
             default:
                 generate_exception(EXC_UD);
@@ -3777,6 +3774,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m32i */
@@ -3784,37 +3782,31 @@ x86_emulate(
                                      4, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildl", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m32i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttpl", dst.val);
+                dst.bytes = 4;
                 break;
             case 2: /* fist m32i */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistl", dst.val);
+                dst.bytes = 4;
                 break;
             case 3: /* fistp m32i */
-                ea.bytes = 4;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistpl", dst.val);
+                dst.bytes = 4;
                 break;
             case 5: /* fld m80fp */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldt", src.val);
+                dst.type = OP_NONE;
                 break;
             case 7: /* fstp m80fp */
-                ea.bytes = 10;
-                dst.type = OP_MEM;
-                dst = ea;
                 emulate_fpu_insn_memdst("fstpt", dst.val);
+                dst.bytes = 10;
                 break;
             default:
                 generate_exception(EXC_UD);
@@ -3885,6 +3877,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fld m64fp */;
@@ -3892,34 +3885,27 @@ x86_emulate(
                                      8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fldl", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m64i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttpll", dst.val);
+                dst.bytes = 8;
                 break;
             case 2: /* fst m64fp */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstl", dst.val);
+                dst.bytes = 8;
                 break;
             case 3: /* fstp m64fp */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fstpl", dst.val);
+                dst.bytes = 8;
                 break;
             case 4: /* frstor - TODO */
             case 6: /* fnsave - TODO */
                 goto cannot_emulate;
             case 7: /* fnstsw m2byte */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fnstsw", dst.val);
+                dst.bytes = 2;
                 break;
             default:
                 generate_exception(EXC_UD);
@@ -3997,6 +3983,7 @@ x86_emulate(
             break;
         default:
             generate_exception_if(ea.type != OP_MEM, EXC_UD);
+            dst = ea;
             switch ( modrm_reg & 7 )
             {
             case 0: /* fild m16i */
@@ -4004,49 +3991,42 @@ x86_emulate(
                                      2, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("filds", src.val);
+                dst.type = OP_NONE;
                 break;
             case 1: /* fisttp m16i */
                 host_and_vcpu_must_have(sse3);
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fisttps", dst.val);
+                dst.bytes = 2;
                 break;
             case 2: /* fist m16i */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fists", dst.val);
+                dst.bytes = 2;
                 break;
             case 3: /* fistp m16i */
-                ea.bytes = 2;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistps", dst.val);
+                dst.bytes = 2;
                 break;
             case 4: /* fbld m80dec */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fbld", src.val);
+                dst.type = OP_NONE;
                 break;
             case 5: /* fild m64i */
                 if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
                                      8, ctxt)) != X86EMUL_OKAY )
                     goto done;
                 emulate_fpu_insn_memsrc("fildll", src.val);
+                dst.type = OP_NONE;
                 break;
             case 6: /* fbstp packed bcd */
-                ea.bytes = 10;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fbstp", dst.val);
+                dst.bytes = 10;
                 break;
             case 7: /* fistp m64i */
-                ea.bytes = 8;
-                dst = ea;
-                dst.type = OP_MEM;
                 emulate_fpu_insn_memdst("fistpll", dst.val);
+                dst.bytes = 8;
                 break;
             }
         }

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 4/6] x86emul: reduce FPU handling code size
  2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
                   ` (2 preceding siblings ...)
  2016-12-08 11:36 ` [PATCH v2 3/6] x86emul: simplify FPU destination " Jan Beulich
@ 2016-12-08 11:37 ` Jan Beulich
  2016-12-08 11:38 ` [PATCH v2 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
  2016-12-08 11:38 ` [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
  5 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 8822 bytes --]

Pulling out the {get,put}_fpu() invocations from individual emulation
paths leads to a couple of kb code size reduction in my builds. Note
that this is fine exception-wise:
- #UD and #NM have implementation defined order relative to one
  another,
- data read #GP/#SS/#PF now properly are delivered after #NM/#UD.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -849,59 +849,44 @@ do {
 } while (0)
 
 #define emulate_fpu_insn(_op)                           \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op "     \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes) : : "memory" );         \
-    put_fpu(&fic);                                      \
-} while (0)
+        : "=m" (fic.insn_bytes) : : "memory" )
 
 #define emulate_fpu_insn_memdst(_op, _arg)              \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
         : "=m" (fic.insn_bytes), "=m" (_arg)            \
-        : : "memory" );                                 \
-    put_fpu(&fic);                                      \
-} while (0)
+        : : "memory" )
 
 #define emulate_fpu_insn_memsrc(_op, _arg)              \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
         : "=m" (fic.insn_bytes)                         \
-        : "m" (_arg) : "memory" );                      \
-    put_fpu(&fic);                                      \
-} while (0)
+        : "m" (_arg) : "memory" )
 
 #define emulate_fpu_insn_stub(_bytes...)                                \
 do {                                                                    \
     uint8_t *buf = get_stub(stub);                                      \
     unsigned int _nr = sizeof((uint8_t[]){ _bytes });                   \
-    struct fpu_insn_ctxt fic = { .insn_bytes = _nr };                   \
+    fic.insn_bytes = _nr;                                               \
     memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1);                \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                                     \
     stub.func();                                                        \
-    put_fpu(&fic);                                                      \
     put_stub(stub);                                                     \
 } while (0)
 
 #define emulate_fpu_insn_stub_eflags(bytes...)                          \
 do {                                                                    \
     unsigned int nr_ = sizeof((uint8_t[]){ bytes });                    \
-    struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ };                  \
     unsigned long tmp_;                                                 \
+    fic.insn_bytes = nr_;                                               \
     memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1);      \
-    get_fpu(X86EMUL_FPU_fpu, &fic_);                                    \
     asm volatile ( _PRE_EFLAGS("[eflags]", "[mask]", "[tmp]")           \
                    "call *%[func];"                                     \
                    _POST_EFLAGS("[eflags]", "[mask]", "[tmp]")          \
@@ -909,7 +894,6 @@ do {
                      [tmp] "=&r" (tmp_)                                 \
                    : [func] "rm" (stub.func),                           \
                      [mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) );            \
-    put_fpu(&fic_);                                                     \
     put_stub(stub);                                                     \
 } while (0)
 
@@ -2492,6 +2476,7 @@ x86_emulate(
     struct operand src = { .reg = PTR_POISON };
     struct operand dst = { .reg = PTR_POISON };
     enum x86_swint_type swint_type;
+    struct fpu_insn_ctxt fic;
     struct x86_emulate_stub stub = {};
     DECLARE_ALIGNED(mmval_t, mmval);
 
@@ -3188,15 +3173,12 @@ x86_emulate(
         break;
 
     case 0x9b:  /* wait/fwait */
-    {
-        struct fpu_insn_ctxt fic = { .insn_bytes = 1 };
-
+        fic.insn_bytes = 1;
         host_and_vcpu_must_have(fpu);
         get_fpu(X86EMUL_FPU_wait, &fic);
         asm volatile ( "fwait" ::: "memory" );
         put_fpu(&fic);
         break;
-    }
 
     case 0x9c: /* pushf */
         generate_exception_if((_regs.eflags & EFLG_VM) &&
@@ -3575,6 +3557,7 @@ x86_emulate(
 
     case 0xd8: /* FPU 0xd8 */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fadd %stN,%st */
@@ -3620,10 +3603,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xd9: /* FPU 0xd9 */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xfb: /* fsincos */
@@ -3700,10 +3685,12 @@ x86_emulate(
                 generate_exception(EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xda: /* FPU 0xda */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fcmovb %stN */
@@ -3749,10 +3736,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdb: /* FPU 0xdb */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fcmovnb %stN */
@@ -3812,10 +3801,12 @@ x86_emulate(
                 generate_exception(EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdc: /* FPU 0xdc */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fadd %st,%stN */
@@ -3861,10 +3852,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdd: /* FPU 0xdd */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* ffree %stN */
@@ -3911,10 +3904,12 @@ x86_emulate(
                 generate_exception(EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xde: /* FPU 0xde */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* faddp %stN */
@@ -3957,10 +3952,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdf: /* FPU 0xdf */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xe0:
@@ -4030,6 +4027,7 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
@@ -4719,8 +4717,8 @@ x86_emulate(
     case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* vmovsd xmm,xmm/m64 */
     {
         uint8_t *buf = get_stub(stub);
-        struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
 
+        fic.insn_bytes = 5;
         buf[0] = 0x3e;
         buf[1] = 0x3e;
         buf[2] = 0x0f;
@@ -4987,8 +4985,8 @@ x86_emulate(
     case X86EMUL_OPC_VEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */
     {
         uint8_t *buf = get_stub(stub);
-        struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
 
+        fic.insn_bytes = 5;
         buf[0] = 0x3e;
         buf[1] = 0x3e;
         buf[2] = 0x0f;



[-- Attachment #2: x86emul-FPU-reduce.patch --]
[-- Type: text/plain, Size: 8860 bytes --]

x86emul: reduce FPU handling code size

Pulling out the {get,put}_fpu() invocations from individual emulation
paths leads to a couple of kb code size reduction in my builds. Note
that this is fine exception-wise:
- #UD and #NM have implementation defined order relative to one
  another,
- data read #GP/#SS/#PF now properly are delivered after #NM/#UD.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -849,59 +849,44 @@ do {
 } while (0)
 
 #define emulate_fpu_insn(_op)                           \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op "     \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes) : : "memory" );         \
-    put_fpu(&fic);                                      \
-} while (0)
+        : "=m" (fic.insn_bytes) : : "memory" )
 
 #define emulate_fpu_insn_memdst(_op, _arg)              \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
         : "=m" (fic.insn_bytes), "=m" (_arg)            \
-        : : "memory" );                                 \
-    put_fpu(&fic);                                      \
-} while (0)
+        : : "memory" )
 
 #define emulate_fpu_insn_memsrc(_op, _arg)              \
-do{ struct fpu_insn_ctxt fic;                           \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                     \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
         : "=m" (fic.insn_bytes)                         \
-        : "m" (_arg) : "memory" );                      \
-    put_fpu(&fic);                                      \
-} while (0)
+        : "m" (_arg) : "memory" )
 
 #define emulate_fpu_insn_stub(_bytes...)                                \
 do {                                                                    \
     uint8_t *buf = get_stub(stub);                                      \
     unsigned int _nr = sizeof((uint8_t[]){ _bytes });                   \
-    struct fpu_insn_ctxt fic = { .insn_bytes = _nr };                   \
+    fic.insn_bytes = _nr;                                               \
     memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1);                \
-    get_fpu(X86EMUL_FPU_fpu, &fic);                                     \
     stub.func();                                                        \
-    put_fpu(&fic);                                                      \
     put_stub(stub);                                                     \
 } while (0)
 
 #define emulate_fpu_insn_stub_eflags(bytes...)                          \
 do {                                                                    \
     unsigned int nr_ = sizeof((uint8_t[]){ bytes });                    \
-    struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ };                  \
     unsigned long tmp_;                                                 \
+    fic.insn_bytes = nr_;                                               \
     memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1);      \
-    get_fpu(X86EMUL_FPU_fpu, &fic_);                                    \
     asm volatile ( _PRE_EFLAGS("[eflags]", "[mask]", "[tmp]")           \
                    "call *%[func];"                                     \
                    _POST_EFLAGS("[eflags]", "[mask]", "[tmp]")          \
@@ -909,7 +894,6 @@ do {
                      [tmp] "=&r" (tmp_)                                 \
                    : [func] "rm" (stub.func),                           \
                      [mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) );            \
-    put_fpu(&fic_);                                                     \
     put_stub(stub);                                                     \
 } while (0)
 
@@ -2492,6 +2476,7 @@ x86_emulate(
     struct operand src = { .reg = PTR_POISON };
     struct operand dst = { .reg = PTR_POISON };
     enum x86_swint_type swint_type;
+    struct fpu_insn_ctxt fic;
     struct x86_emulate_stub stub = {};
     DECLARE_ALIGNED(mmval_t, mmval);
 
@@ -3188,15 +3173,12 @@ x86_emulate(
         break;
 
     case 0x9b:  /* wait/fwait */
-    {
-        struct fpu_insn_ctxt fic = { .insn_bytes = 1 };
-
+        fic.insn_bytes = 1;
         host_and_vcpu_must_have(fpu);
         get_fpu(X86EMUL_FPU_wait, &fic);
         asm volatile ( "fwait" ::: "memory" );
         put_fpu(&fic);
         break;
-    }
 
     case 0x9c: /* pushf */
         generate_exception_if((_regs.eflags & EFLG_VM) &&
@@ -3575,6 +3557,7 @@ x86_emulate(
 
     case 0xd8: /* FPU 0xd8 */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fadd %stN,%st */
@@ -3620,10 +3603,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xd9: /* FPU 0xd9 */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xfb: /* fsincos */
@@ -3700,10 +3685,12 @@ x86_emulate(
                 generate_exception(EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xda: /* FPU 0xda */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fcmovb %stN */
@@ -3749,10 +3736,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdb: /* FPU 0xdb */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fcmovnb %stN */
@@ -3812,10 +3801,12 @@ x86_emulate(
                 generate_exception(EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdc: /* FPU 0xdc */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* fadd %st,%stN */
@@ -3861,10 +3852,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdd: /* FPU 0xdd */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* ffree %stN */
@@ -3911,10 +3904,12 @@ x86_emulate(
                 generate_exception(EXC_UD);
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xde: /* FPU 0xde */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xc0 ... 0xc7: /* faddp %stN */
@@ -3957,10 +3952,12 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xdf: /* FPU 0xdf */
         host_and_vcpu_must_have(fpu);
+        get_fpu(X86EMUL_FPU_fpu, &fic);
         switch ( modrm )
         {
         case 0xe0:
@@ -4030,6 +4027,7 @@ x86_emulate(
                 break;
             }
         }
+        put_fpu(&fic);
         break;
 
     case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
@@ -4719,8 +4717,8 @@ x86_emulate(
     case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* vmovsd xmm,xmm/m64 */
     {
         uint8_t *buf = get_stub(stub);
-        struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
 
+        fic.insn_bytes = 5;
         buf[0] = 0x3e;
         buf[1] = 0x3e;
         buf[2] = 0x0f;
@@ -4987,8 +4985,8 @@ x86_emulate(
     case X86EMUL_OPC_VEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */
     {
         uint8_t *buf = get_stub(stub);
-        struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
 
+        fic.insn_bytes = 5;
         buf[0] = 0x3e;
         buf[1] = 0x3e;
         buf[2] = 0x0f;

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands
  2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
                   ` (3 preceding siblings ...)
  2016-12-08 11:37 ` [PATCH v2 4/6] x86emul: reduce FPU handling code size Jan Beulich
@ 2016-12-08 11:38 ` Jan Beulich
  2016-12-08 11:38 ` [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
  5 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 3091 bytes --]

Accessing an 8-byte (or perhaps just 4-byte in the test harness when
built as 32-bit app) field to read/write 10 bytes (leveraging the
successive field) is a latent bug, as the compiler could copy things
around. Use the 32 bytes large SSE/AVX slot instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.
---
The presence of the !op->write checks implies a logical (but not
functional) dependency on the patch making ops->write (and ->cmpxchg)
optional. Without that patch they're just dead code.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3787,15 +3787,19 @@ x86_emulate(
                 dst.bytes = 4;
                 break;
             case 5: /* fld m80fp */
-                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
-                emulate_fpu_insn_memsrc("fldt", src.val);
+                emulate_fpu_insn_memsrc("fldt", *mmvalp);
                 dst.type = OP_NONE;
                 break;
             case 7: /* fstp m80fp */
-                emulate_fpu_insn_memdst("fstpt", dst.val);
-                dst.bytes = 10;
+                fail_if(!ops->write);
+                emulate_fpu_insn_memdst("fstpt", *mmvalp);
+                if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+                                      10, ctxt)) != X86EMUL_OKAY )
+                    goto done;
+                dst.type = OP_NONE;
                 break;
             default:
                 generate_exception(EXC_UD);
@@ -4004,10 +4008,10 @@ x86_emulate(
                 dst.bytes = 2;
                 break;
             case 4: /* fbld m80dec */
-                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
-                emulate_fpu_insn_memsrc("fbld", src.val);
+                emulate_fpu_insn_memsrc("fbld", *mmvalp);
                 dst.type = OP_NONE;
                 break;
             case 5: /* fild m64i */
@@ -4018,8 +4022,12 @@ x86_emulate(
                 dst.type = OP_NONE;
                 break;
             case 6: /* fbstp packed bcd */
-                emulate_fpu_insn_memdst("fbstp", dst.val);
-                dst.bytes = 10;
+                fail_if(!ops->write);
+                emulate_fpu_insn_memdst("fbstp", *mmvalp);
+                if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+                                      10, ctxt)) != X86EMUL_OKAY )
+                    goto done;
+                dst.type = OP_NONE;
                 break;
             case 7: /* fistp m64i */
                 emulate_fpu_insn_memdst("fistpll", dst.val);




[-- Attachment #2: x86emul-FPU-10byte.patch --]
[-- Type: text/plain, Size: 3161 bytes --]

x86emul: avoid undefined behavior when dealing with 10-byte FPU operands

Accessing an 8-byte (or perhaps just 4-byte in the test harness when
built as 32-bit app) field to read/write 10 bytes (leveraging the
successive field) is a latent bug, as the compiler could copy things
around. Use the 32 bytes large SSE/AVX slot instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.
---
The presence of the !op->write checks implies a logical (but not
functional) dependency on the patch making ops->write (and ->cmpxchg)
optional. Without that patch they're just dead code.

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3787,15 +3787,19 @@ x86_emulate(
                 dst.bytes = 4;
                 break;
             case 5: /* fld m80fp */
-                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
-                emulate_fpu_insn_memsrc("fldt", src.val);
+                emulate_fpu_insn_memsrc("fldt", *mmvalp);
                 dst.type = OP_NONE;
                 break;
             case 7: /* fstp m80fp */
-                emulate_fpu_insn_memdst("fstpt", dst.val);
-                dst.bytes = 10;
+                fail_if(!ops->write);
+                emulate_fpu_insn_memdst("fstpt", *mmvalp);
+                if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+                                      10, ctxt)) != X86EMUL_OKAY )
+                    goto done;
+                dst.type = OP_NONE;
                 break;
             default:
                 generate_exception(EXC_UD);
@@ -4004,10 +4008,10 @@ x86_emulate(
                 dst.bytes = 2;
                 break;
             case 4: /* fbld m80dec */
-                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+                if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
                                      10, ctxt)) != X86EMUL_OKAY )
                     goto done;
-                emulate_fpu_insn_memsrc("fbld", src.val);
+                emulate_fpu_insn_memsrc("fbld", *mmvalp);
                 dst.type = OP_NONE;
                 break;
             case 5: /* fild m64i */
@@ -4018,8 +4022,12 @@ x86_emulate(
                 dst.type = OP_NONE;
                 break;
             case 6: /* fbstp packed bcd */
-                emulate_fpu_insn_memdst("fbstp", dst.val);
-                dst.bytes = 10;
+                fail_if(!ops->write);
+                emulate_fpu_insn_memdst("fbstp", *mmvalp);
+                if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+                                      10, ctxt)) != X86EMUL_OKAY )
+                    goto done;
+                dst.type = OP_NONE;
                 break;
             case 7: /* fistp m64i */
                 emulate_fpu_insn_memdst("fistpll", dst.val);

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints
  2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
                   ` (4 preceding siblings ...)
  2016-12-08 11:38 ` [PATCH v2 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
@ 2016-12-08 11:38 ` Jan Beulich
  2016-12-15  9:52   ` Ping: " Jan Beulich
  5 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:38 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

[-- Attachment #1: Type: text/plain, Size: 1856 bytes --]

The memory clobber is rather harsh here. However, fic.exn_raised may be
modified as a side effect, so we need to let the compiler know that all
of "fic" may be changed (preventing it from moving around accesses to
the exn_raised field).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -759,7 +759,7 @@ do {
 })
 
 struct fpu_insn_ctxt {
-    uint8_t insn_bytes;
+    uint8_t insn_bytes; /* Must be first! */
     int8_t exn_raised;
 };
 
@@ -853,23 +853,21 @@ do {
         "movb $2f-1f,%0 \n"                             \
         "1: " _op "     \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes) : : "memory" )
+        : "+m" (fic) )
 
 #define emulate_fpu_insn_memdst(_op, _arg)              \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes), "=m" (_arg)            \
-        : : "memory" )
+        : "+m" (fic), "=m" (_arg) )
 
 #define emulate_fpu_insn_memsrc(_op, _arg)              \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes)                         \
-        : "m" (_arg) : "memory" )
+        : "+m" (fic) : "m" (_arg) )
 
 #define emulate_fpu_insn_stub(_bytes...)                                \
 do {                                                                    \




[-- Attachment #2: x86emul-FPU-constraints.patch --]
[-- Type: text/plain, Size: 1902 bytes --]

x86emul: simplify FPU handling asm() constraints

The memory clobber is rather harsh here. However, fic.exn_raised may be
modified as a side effect, so we need to let the compiler know that all
of "fic" may be changed (preventing it from moving around accesses to
the exn_raised field).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -759,7 +759,7 @@ do {
 })
 
 struct fpu_insn_ctxt {
-    uint8_t insn_bytes;
+    uint8_t insn_bytes; /* Must be first! */
     int8_t exn_raised;
 };
 
@@ -853,23 +853,21 @@ do {
         "movb $2f-1f,%0 \n"                             \
         "1: " _op "     \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes) : : "memory" )
+        : "+m" (fic) )
 
 #define emulate_fpu_insn_memdst(_op, _arg)              \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes), "=m" (_arg)            \
-        : : "memory" )
+        : "+m" (fic), "=m" (_arg) )
 
 #define emulate_fpu_insn_memsrc(_op, _arg)              \
     asm volatile (                                      \
         "movb $2f-1f,%0 \n"                             \
         "1: " _op " %1  \n"                             \
         "2:             \n"                             \
-        : "=m" (fic.insn_bytes)                         \
-        : "m" (_arg) : "memory" )
+        : "+m" (fic) : "m" (_arg) )
 
 #define emulate_fpu_insn_stub(_bytes...)                                \
 do {                                                                    \

[-- Attachment #3: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes
  2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
@ 2016-12-08 11:42   ` Andrew Cooper
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2016-12-08 11:42 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 08/12/16 11:35, Jan Beulich wrote:
> First of all there are a number of secondary encodings both Intel and
> AMD support, but which aren't formally documented. See e.g.
> www.sandpile.org/x86/opc_fpu.htm for inofficial documentation.
>
> Next there are a few more no-ops - instructions which served a purpose
> only on 8087 or 287.
>
> Further switch from fail_if() to raising of #UD in a couple of places
> (as the decoding of FPU opcodes should now be complete except where
> explicitly marked as todo).
>
> Also adjust a few comments.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Ping: [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints
  2016-12-08 11:38 ` [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
@ 2016-12-15  9:52   ` Jan Beulich
  2016-12-15 11:36     ` Andrew Cooper
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-12-15  9:52 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper

>>> On 08.12.16 at 12:38, <JBeulich@suse.com> wrote:
> The memory clobber is rather harsh here. However, fic.exn_raised may be
> modified as a side effect, so we need to let the compiler know that all
> of "fic" may be changed (preventing it from moving around accesses to
> the exn_raised field).
> 
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Ping? (The v1 discussion had stalled, so at that point it seemed best
to simply re-send with the updates to the earlier patches.)

> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -759,7 +759,7 @@ do {
>  })
>  
>  struct fpu_insn_ctxt {
> -    uint8_t insn_bytes;
> +    uint8_t insn_bytes; /* Must be first! */
>      int8_t exn_raised;
>  };
>  
> @@ -853,23 +853,21 @@ do {
>          "movb $2f-1f,%0 \n"                             \
>          "1: " _op "     \n"                             \
>          "2:             \n"                             \
> -        : "=m" (fic.insn_bytes) : : "memory" )
> +        : "+m" (fic) )
>  
>  #define emulate_fpu_insn_memdst(_op, _arg)              \
>      asm volatile (                                      \
>          "movb $2f-1f,%0 \n"                             \
>          "1: " _op " %1  \n"                             \
>          "2:             \n"                             \
> -        : "=m" (fic.insn_bytes), "=m" (_arg)            \
> -        : : "memory" )
> +        : "+m" (fic), "=m" (_arg) )
>  
>  #define emulate_fpu_insn_memsrc(_op, _arg)              \
>      asm volatile (                                      \
>          "movb $2f-1f,%0 \n"                             \
>          "1: " _op " %1  \n"                             \
>          "2:             \n"                             \
> -        : "=m" (fic.insn_bytes)                         \
> -        : "m" (_arg) : "memory" )
> +        : "+m" (fic) : "m" (_arg) )
>  
>  #define emulate_fpu_insn_stub(_bytes...)                                \
>  do {                                                                    \




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ping: [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints
  2016-12-15  9:52   ` Ping: " Jan Beulich
@ 2016-12-15 11:36     ` Andrew Cooper
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2016-12-15 11:36 UTC (permalink / raw)
  To: Jan Beulich, xen-devel

On 15/12/16 09:52, Jan Beulich wrote:
>>>> On 08.12.16 at 12:38, <JBeulich@suse.com> wrote:
>> The memory clobber is rather harsh here. However, fic.exn_raised may be
>> modified as a side effect, so we need to let the compiler know that all
>> of "fic" may be changed (preventing it from moving around accesses to
>> the exn_raised field).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Ping? (The v1 discussion had stalled, so at that point it seemed best
> to simply re-send with the updates to the earlier patches.)

Sorry, yes.

Having considered this further (when considering my other exception
plans), I am now more convinced the code is already latently buggy, and
fixing that latent bug avoids the need for this.

exn_raised can be set at any point between a {get,put}_fpu(), not just
within these stubs.  It therefore should already be accessed with a
volatile reference to avoid breaking C's model of the world, as the
memory clobbers here are already not sufficient.

Fixing the volatility of access to exn_raised will allow the safe
dropping of the memory clobbers, without further modifying the memory
operands.

I already have a patch to do similar improvements to the time init code
from my memory barrier series. I will see about getting that posted.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-12-15 11:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
2016-12-08 11:42   ` Andrew Cooper
2016-12-08 11:36 ` [PATCH v2 2/6] x86emul: simplify FPU source operand handling Jan Beulich
2016-12-08 11:36 ` [PATCH v2 3/6] x86emul: simplify FPU destination " Jan Beulich
2016-12-08 11:37 ` [PATCH v2 4/6] x86emul: reduce FPU handling code size Jan Beulich
2016-12-08 11:38 ` [PATCH v2 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
2016-12-08 11:38 ` [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
2016-12-15  9:52   ` Ping: " Jan Beulich
2016-12-15 11:36     ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.