* [PATCH v2 0/6] x86emul: FPU handling improvements
@ 2016-12-08 11:26 Jan Beulich
2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:26 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
1: extend / amend supported FPU opcodes
2: simplify FPU source operand handling
3: simplify FPU destination operand handling
4: reduce FPU handling code size
5: avoid undefined behavior when dealing with 10-byte FPU operands
6: simplify FPU handling asm() constraints
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Only patch 1 really changed (see there), some others just
needed re-basing.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes
2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
@ 2016-12-08 11:35 ` Jan Beulich
2016-12-08 11:42 ` Andrew Cooper
2016-12-08 11:36 ` [PATCH v2 2/6] x86emul: simplify FPU source operand handling Jan Beulich
` (4 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:35 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
[-- Attachment #1: Type: text/plain, Size: 9106 bytes --]
First of all there are a number of secondary encodings both Intel and
AMD support, but which aren't formally documented. See e.g.
www.sandpile.org/x86/opc_fpu.htm for inofficial documentation.
Next there are a few more no-ops - instructions which served a purpose
only on 8087 or 287.
Further switch from fail_if() to raising of #UD in a couple of places
(as the decoding of FPU opcodes should now be complete except where
explicitly marked as todo).
Also adjust a few comments.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Correct placement of ffreep case. Use generate_exception(...) in
favor of generate_exception_if(true, ...). Add sandpile.org
reference.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3577,18 +3577,18 @@ x86_emulate(
host_and_vcpu_must_have(fpu);
switch ( modrm )
{
- case 0xc0 ... 0xc7: /* fadd %stN,%stN */
- case 0xc8 ... 0xcf: /* fmul %stN,%stN */
- case 0xd0 ... 0xd7: /* fcom %stN,%stN */
- case 0xd8 ... 0xdf: /* fcomp %stN,%stN */
- case 0xe0 ... 0xe7: /* fsub %stN,%stN */
- case 0xe8 ... 0xef: /* fsubr %stN,%stN */
- case 0xf0 ... 0xf7: /* fdiv %stN,%stN */
- case 0xf8 ... 0xff: /* fdivr %stN,%stN */
+ case 0xc0 ... 0xc7: /* fadd %stN,%st */
+ case 0xc8 ... 0xcf: /* fmul %stN,%st */
+ case 0xd0 ... 0xd7: /* fcom %stN,%st */
+ case 0xd8 ... 0xdf: /* fcomp %stN,%st */
+ case 0xe0 ... 0xe7: /* fsub %stN,%st */
+ case 0xe8 ... 0xef: /* fsubr %stN,%st */
+ case 0xf0 ... 0xf7: /* fdiv %stN,%st */
+ case 0xf8 ... 0xff: /* fdivr %stN,%st */
emulate_fpu_insn_stub(0xd8, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ ASSERT(ea.type == OP_MEM);
ea.bytes = 4;
src = ea;
if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3634,6 +3634,7 @@ x86_emulate(
case 0xc0 ... 0xc7: /* fld %stN */
case 0xc8 ... 0xcf: /* fxch %stN */
case 0xd0: /* fnop */
+ case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
case 0xe0: /* fchs */
case 0xe1: /* fabs */
case 0xe4: /* ftst */
@@ -3663,7 +3664,7 @@ x86_emulate(
emulate_fpu_insn_stub(0xd9, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
switch ( modrm_reg & 7 )
{
case 0: /* fld m32fp */
@@ -3686,7 +3687,8 @@ x86_emulate(
dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstps", dst.val);
break;
- /* case 4: fldenv - TODO */
+ case 4: /* fldenv - TODO */
+ goto cannot_emulate;
case 5: /* fldcw m2byte */
ea.bytes = 2;
src = ea;
@@ -3695,7 +3697,8 @@ x86_emulate(
goto done;
emulate_fpu_insn_memsrc("fldcw", src.val);
break;
- /* case 6: fstenv - TODO */
+ case 6: /* fnstenv - TODO */
+ goto cannot_emulate;
case 7: /* fnstcw m2byte */
ea.bytes = 2;
dst = ea;
@@ -3703,7 +3706,7 @@ x86_emulate(
emulate_fpu_insn_memdst("fnstcw", dst.val);
break;
default:
- goto cannot_emulate;
+ generate_exception(EXC_UD);
}
}
break;
@@ -3723,7 +3726,7 @@ x86_emulate(
emulate_fpu_insn_stub(0xda, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
ea.bytes = 4;
src = ea;
if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3772,16 +3775,16 @@ x86_emulate(
vcpu_must_have_cmov();
emulate_fpu_insn_stub_eflags(0xdb, modrm);
break;
+ case 0xe0: /* fneni - 8087 only, ignored by 287 */
+ case 0xe1: /* fndisi - 8087 only, ignored by 287 */
case 0xe2: /* fnclex */
- emulate_fpu_insn("fnclex");
- break;
case 0xe3: /* fninit */
- emulate_fpu_insn("fninit");
- break;
- case 0xe4: /* fsetpm - 287 only, ignored by 387 */
+ case 0xe4: /* fnsetpm - 287 only, ignored by 387 */
+ /* case 0xe5: frstpm - 287 only, #UD on 387 */
+ emulate_fpu_insn_stub(0xdb, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
switch ( modrm_reg & 7 )
{
case 0: /* fild m32i */
@@ -3826,7 +3829,7 @@ x86_emulate(
emulate_fpu_insn_memdst("fstpt", dst.val);
break;
default:
- goto cannot_emulate;
+ generate_exception(EXC_UD);
}
}
break;
@@ -3835,16 +3838,18 @@ x86_emulate(
host_and_vcpu_must_have(fpu);
switch ( modrm )
{
- case 0xc0 ... 0xc7: /* fadd %stN */
- case 0xc8 ... 0xcf: /* fmul %stN */
- case 0xe0 ... 0xe7: /* fsubr %stN */
- case 0xe8 ... 0xef: /* fsub %stN */
- case 0xf0 ... 0xf7: /* fdivr %stN */
- case 0xf8 ... 0xff: /* fdiv %stN */
+ case 0xc0 ... 0xc7: /* fadd %st,%stN */
+ case 0xc8 ... 0xcf: /* fmul %st,%stN */
+ case 0xd0 ... 0xd7: /* fcom %stN,%st (alternative encoding) */
+ case 0xd8 ... 0xdf: /* fcomp %stN,%st (alternative encoding) */
+ case 0xe0 ... 0xe7: /* fsubr %st,%stN */
+ case 0xe8 ... 0xef: /* fsub %st,%stN */
+ case 0xf0 ... 0xf7: /* fdivr %st,%stN */
+ case 0xf8 ... 0xff: /* fdiv %st,%stN */
emulate_fpu_insn_stub(0xdc, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ ASSERT(ea.type == OP_MEM);
ea.bytes = 8;
src = ea;
if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3885,6 +3890,7 @@ x86_emulate(
switch ( modrm )
{
case 0xc0 ... 0xc7: /* ffree %stN */
+ case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
case 0xd0 ... 0xd7: /* fst %stN */
case 0xd8 ... 0xdf: /* fstp %stN */
case 0xe0 ... 0xe7: /* fucom %stN */
@@ -3892,7 +3898,7 @@ x86_emulate(
emulate_fpu_insn_stub(0xdd, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
switch ( modrm_reg & 7 )
{
case 0: /* fld m64fp */;
@@ -3922,6 +3928,9 @@ x86_emulate(
dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstpl", dst.val);
break;
+ case 4: /* frstor - TODO */
+ case 6: /* fnsave - TODO */
+ goto cannot_emulate;
case 7: /* fnstsw m2byte */
ea.bytes = 2;
dst = ea;
@@ -3929,7 +3938,7 @@ x86_emulate(
emulate_fpu_insn_memdst("fnstsw", dst.val);
break;
default:
- goto cannot_emulate;
+ generate_exception(EXC_UD);
}
}
break;
@@ -3940,6 +3949,7 @@ x86_emulate(
{
case 0xc0 ... 0xc7: /* faddp %stN */
case 0xc8 ... 0xcf: /* fmulp %stN */
+ case 0xd0 ... 0xd7: /* fcomp %stN (alternative encoding) */
case 0xd9: /* fcompp */
case 0xe0 ... 0xe7: /* fsubrp %stN */
case 0xe8 ... 0xef: /* fsubp %stN */
@@ -3948,7 +3958,7 @@ x86_emulate(
emulate_fpu_insn_stub(0xde, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
ea.bytes = 2;
src = ea;
if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -4000,8 +4010,14 @@ x86_emulate(
vcpu_must_have_cmov();
emulate_fpu_insn_stub_eflags(0xdf, modrm);
break;
+ case 0xc0 ... 0xc7: /* ffreep %stN */
+ case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
+ case 0xd0 ... 0xd7: /* fstp %stN (alternative encoding) */
+ case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
+ emulate_fpu_insn_stub(0xdf, modrm);
+ break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
switch ( modrm_reg & 7 )
{
case 0: /* fild m16i */
[-- Attachment #2: x86emul-FPU-opcodes.patch --]
[-- Type: text/plain, Size: 9151 bytes --]
x86emul: extend / amend supported FPU opcodes
First of all there are a number of secondary encodings both Intel and
AMD support, but which aren't formally documented. See e.g.
www.sandpile.org/x86/opc_fpu.htm for inofficial documentation.
Next there are a few more no-ops - instructions which served a purpose
only on 8087 or 287.
Further switch from fail_if() to raising of #UD in a couple of places
(as the decoding of FPU opcodes should now be complete except where
explicitly marked as todo).
Also adjust a few comments.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
v2: Correct placement of ffreep case. Use generate_exception(...) in
favor of generate_exception_if(true, ...). Add sandpile.org
reference.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3577,18 +3577,18 @@ x86_emulate(
host_and_vcpu_must_have(fpu);
switch ( modrm )
{
- case 0xc0 ... 0xc7: /* fadd %stN,%stN */
- case 0xc8 ... 0xcf: /* fmul %stN,%stN */
- case 0xd0 ... 0xd7: /* fcom %stN,%stN */
- case 0xd8 ... 0xdf: /* fcomp %stN,%stN */
- case 0xe0 ... 0xe7: /* fsub %stN,%stN */
- case 0xe8 ... 0xef: /* fsubr %stN,%stN */
- case 0xf0 ... 0xf7: /* fdiv %stN,%stN */
- case 0xf8 ... 0xff: /* fdivr %stN,%stN */
+ case 0xc0 ... 0xc7: /* fadd %stN,%st */
+ case 0xc8 ... 0xcf: /* fmul %stN,%st */
+ case 0xd0 ... 0xd7: /* fcom %stN,%st */
+ case 0xd8 ... 0xdf: /* fcomp %stN,%st */
+ case 0xe0 ... 0xe7: /* fsub %stN,%st */
+ case 0xe8 ... 0xef: /* fsubr %stN,%st */
+ case 0xf0 ... 0xf7: /* fdiv %stN,%st */
+ case 0xf8 ... 0xff: /* fdivr %stN,%st */
emulate_fpu_insn_stub(0xd8, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ ASSERT(ea.type == OP_MEM);
ea.bytes = 4;
src = ea;
if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3634,6 +3634,7 @@ x86_emulate(
case 0xc0 ... 0xc7: /* fld %stN */
case 0xc8 ... 0xcf: /* fxch %stN */
case 0xd0: /* fnop */
+ case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
case 0xe0: /* fchs */
case 0xe1: /* fabs */
case 0xe4: /* ftst */
@@ -3663,7 +3664,7 @@ x86_emulate(
emulate_fpu_insn_stub(0xd9, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
switch ( modrm_reg & 7 )
{
case 0: /* fld m32fp */
@@ -3686,7 +3687,8 @@ x86_emulate(
dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstps", dst.val);
break;
- /* case 4: fldenv - TODO */
+ case 4: /* fldenv - TODO */
+ goto cannot_emulate;
case 5: /* fldcw m2byte */
ea.bytes = 2;
src = ea;
@@ -3695,7 +3697,8 @@ x86_emulate(
goto done;
emulate_fpu_insn_memsrc("fldcw", src.val);
break;
- /* case 6: fstenv - TODO */
+ case 6: /* fnstenv - TODO */
+ goto cannot_emulate;
case 7: /* fnstcw m2byte */
ea.bytes = 2;
dst = ea;
@@ -3703,7 +3706,7 @@ x86_emulate(
emulate_fpu_insn_memdst("fnstcw", dst.val);
break;
default:
- goto cannot_emulate;
+ generate_exception(EXC_UD);
}
}
break;
@@ -3723,7 +3726,7 @@ x86_emulate(
emulate_fpu_insn_stub(0xda, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
ea.bytes = 4;
src = ea;
if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3772,16 +3775,16 @@ x86_emulate(
vcpu_must_have_cmov();
emulate_fpu_insn_stub_eflags(0xdb, modrm);
break;
+ case 0xe0: /* fneni - 8087 only, ignored by 287 */
+ case 0xe1: /* fndisi - 8087 only, ignored by 287 */
case 0xe2: /* fnclex */
- emulate_fpu_insn("fnclex");
- break;
case 0xe3: /* fninit */
- emulate_fpu_insn("fninit");
- break;
- case 0xe4: /* fsetpm - 287 only, ignored by 387 */
+ case 0xe4: /* fnsetpm - 287 only, ignored by 387 */
+ /* case 0xe5: frstpm - 287 only, #UD on 387 */
+ emulate_fpu_insn_stub(0xdb, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
switch ( modrm_reg & 7 )
{
case 0: /* fild m32i */
@@ -3826,7 +3829,7 @@ x86_emulate(
emulate_fpu_insn_memdst("fstpt", dst.val);
break;
default:
- goto cannot_emulate;
+ generate_exception(EXC_UD);
}
}
break;
@@ -3835,16 +3838,18 @@ x86_emulate(
host_and_vcpu_must_have(fpu);
switch ( modrm )
{
- case 0xc0 ... 0xc7: /* fadd %stN */
- case 0xc8 ... 0xcf: /* fmul %stN */
- case 0xe0 ... 0xe7: /* fsubr %stN */
- case 0xe8 ... 0xef: /* fsub %stN */
- case 0xf0 ... 0xf7: /* fdivr %stN */
- case 0xf8 ... 0xff: /* fdiv %stN */
+ case 0xc0 ... 0xc7: /* fadd %st,%stN */
+ case 0xc8 ... 0xcf: /* fmul %st,%stN */
+ case 0xd0 ... 0xd7: /* fcom %stN,%st (alternative encoding) */
+ case 0xd8 ... 0xdf: /* fcomp %stN,%st (alternative encoding) */
+ case 0xe0 ... 0xe7: /* fsubr %st,%stN */
+ case 0xe8 ... 0xef: /* fsub %st,%stN */
+ case 0xf0 ... 0xf7: /* fdivr %st,%stN */
+ case 0xf8 ... 0xff: /* fdiv %st,%stN */
emulate_fpu_insn_stub(0xdc, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ ASSERT(ea.type == OP_MEM);
ea.bytes = 8;
src = ea;
if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -3885,6 +3890,7 @@ x86_emulate(
switch ( modrm )
{
case 0xc0 ... 0xc7: /* ffree %stN */
+ case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
case 0xd0 ... 0xd7: /* fst %stN */
case 0xd8 ... 0xdf: /* fstp %stN */
case 0xe0 ... 0xe7: /* fucom %stN */
@@ -3892,7 +3898,7 @@ x86_emulate(
emulate_fpu_insn_stub(0xdd, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
switch ( modrm_reg & 7 )
{
case 0: /* fld m64fp */;
@@ -3922,6 +3928,9 @@ x86_emulate(
dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstpl", dst.val);
break;
+ case 4: /* frstor - TODO */
+ case 6: /* fnsave - TODO */
+ goto cannot_emulate;
case 7: /* fnstsw m2byte */
ea.bytes = 2;
dst = ea;
@@ -3929,7 +3938,7 @@ x86_emulate(
emulate_fpu_insn_memdst("fnstsw", dst.val);
break;
default:
- goto cannot_emulate;
+ generate_exception(EXC_UD);
}
}
break;
@@ -3940,6 +3949,7 @@ x86_emulate(
{
case 0xc0 ... 0xc7: /* faddp %stN */
case 0xc8 ... 0xcf: /* fmulp %stN */
+ case 0xd0 ... 0xd7: /* fcomp %stN (alternative encoding) */
case 0xd9: /* fcompp */
case 0xe0 ... 0xe7: /* fsubrp %stN */
case 0xe8 ... 0xef: /* fsubp %stN */
@@ -3948,7 +3958,7 @@ x86_emulate(
emulate_fpu_insn_stub(0xde, modrm);
break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
ea.bytes = 2;
src = ea;
if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
@@ -4000,8 +4010,14 @@ x86_emulate(
vcpu_must_have_cmov();
emulate_fpu_insn_stub_eflags(0xdf, modrm);
break;
+ case 0xc0 ... 0xc7: /* ffreep %stN */
+ case 0xc8 ... 0xcf: /* fxch %stN (alternative encoding) */
+ case 0xd0 ... 0xd7: /* fstp %stN (alternative encoding) */
+ case 0xd8 ... 0xdf: /* fstp %stN (alternative encoding) */
+ emulate_fpu_insn_stub(0xdf, modrm);
+ break;
default:
- fail_if(modrm >= 0xc0);
+ generate_exception_if(ea.type != OP_MEM, EXC_UD);
switch ( modrm_reg & 7 )
{
case 0: /* fild m16i */
[-- Attachment #3: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 2/6] x86emul: simplify FPU source operand handling
2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
@ 2016-12-08 11:36 ` Jan Beulich
2016-12-08 11:36 ` [PATCH v2 3/6] x86emul: simplify FPU destination " Jan Beulich
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:36 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
[-- Attachment #1: Type: text/plain, Size: 8033 bytes --]
Consistently use ea instead of src for passing the memory address to
->read(). This eliminates the need to copy ea to src, resulting in a
couple of hundred bytes smaller binary size.
In addition for opcode DE we can leverage SrcMem16 to eliminate a call
of the ->read() hook. At the same time drop the stray Mov attributes
from D8, DA, DC, and DE: They're meaningful for memory writes only.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -159,10 +159,10 @@ static const opcode_desc_t opcode_table[
ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
DstImplicit|SrcImmByte, DstImplicit|SrcImmByte, ImplicitOps, ImplicitOps,
/* 0xD8 - 0xDF */
- ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
- ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
- ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
- ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
+ ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+ ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+ ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+ DstImplicit|SrcMem16|ModRM, ImplicitOps|ModRM|Mov,
/* 0xE0 - 0xE7 */
DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
@@ -3589,10 +3589,8 @@ x86_emulate(
break;
default:
ASSERT(ea.type == OP_MEM);
- ea.bytes = 4;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 4, ctxt)) != X86EMUL_OKAY )
goto done;
switch ( modrm_reg & 7 )
{
@@ -3668,10 +3666,8 @@ x86_emulate(
switch ( modrm_reg & 7 )
{
case 0: /* fld m32fp */
- ea.bytes = 4;
- src = ea;
if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ 4, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("flds", src.val);
break;
@@ -3690,10 +3686,8 @@ x86_emulate(
case 4: /* fldenv - TODO */
goto cannot_emulate;
case 5: /* fldcw m2byte */
- ea.bytes = 2;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 2, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldcw", src.val);
break;
@@ -3727,10 +3721,8 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
- ea.bytes = 4;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 4, ctxt)) != X86EMUL_OKAY )
goto done;
switch ( modrm_reg & 7 )
{
@@ -3788,10 +3780,8 @@ x86_emulate(
switch ( modrm_reg & 7 )
{
case 0: /* fild m32i */
- ea.bytes = 4;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 4, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fildl", src.val);
break;
@@ -3815,10 +3805,8 @@ x86_emulate(
emulate_fpu_insn_memdst("fistpl", dst.val);
break;
case 5: /* fld m80fp */
- ea.bytes = 10;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off,
- &src.val, src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 10, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldt", src.val);
break;
@@ -3850,10 +3838,8 @@ x86_emulate(
break;
default:
ASSERT(ea.type == OP_MEM);
- ea.bytes = 8;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 8, ctxt)) != X86EMUL_OKAY )
goto done;
switch ( modrm_reg & 7 )
{
@@ -3902,10 +3888,8 @@ x86_emulate(
switch ( modrm_reg & 7 )
{
case 0: /* fld m64fp */;
- ea.bytes = 8;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 8, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldl", src.val);
break;
@@ -3959,11 +3943,6 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
- ea.bytes = 2;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
- goto done;
switch ( modrm_reg & 7 )
{
case 0: /* fiadd m16i */
@@ -4021,10 +4000,8 @@ x86_emulate(
switch ( modrm_reg & 7 )
{
case 0: /* fild m16i */
- ea.bytes = 2;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 2, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("filds", src.val);
break;
@@ -4048,18 +4025,14 @@ x86_emulate(
emulate_fpu_insn_memdst("fistps", dst.val);
break;
case 4: /* fbld m80dec */
- ea.bytes = 10;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off,
- &src.val, src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 10, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fbld", src.val);
break;
case 5: /* fild m64i */
- ea.bytes = 8;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 8, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fildll", src.val);
break;
[-- Attachment #2: x86emul-FPU-simplify-src.patch --]
[-- Type: text/plain, Size: 8078 bytes --]
x86emul: simplify FPU source operand handling
Consistently use ea instead of src for passing the memory address to
->read(). This eliminates the need to copy ea to src, resulting in a
couple of hundred bytes smaller binary size.
In addition for opcode DE we can leverage SrcMem16 to eliminate a call
of the ->read() hook. At the same time drop the stray Mov attributes
from D8, DA, DC, and DE: They're meaningful for memory writes only.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -159,10 +159,10 @@ static const opcode_desc_t opcode_table[
ByteOp|DstMem|SrcImplicit|ModRM, DstMem|SrcImplicit|ModRM,
DstImplicit|SrcImmByte, DstImplicit|SrcImmByte, ImplicitOps, ImplicitOps,
/* 0xD8 - 0xDF */
- ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
- ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
- ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
- ImplicitOps|ModRM|Mov, ImplicitOps|ModRM|Mov,
+ ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+ ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+ ImplicitOps|ModRM, ImplicitOps|ModRM|Mov,
+ DstImplicit|SrcMem16|ModRM, ImplicitOps|ModRM|Mov,
/* 0xE0 - 0xE7 */
DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
DstImplicit|SrcImmByte, DstImplicit|SrcImmByte,
@@ -3589,10 +3589,8 @@ x86_emulate(
break;
default:
ASSERT(ea.type == OP_MEM);
- ea.bytes = 4;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 4, ctxt)) != X86EMUL_OKAY )
goto done;
switch ( modrm_reg & 7 )
{
@@ -3668,10 +3666,8 @@ x86_emulate(
switch ( modrm_reg & 7 )
{
case 0: /* fld m32fp */
- ea.bytes = 4;
- src = ea;
if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ 4, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("flds", src.val);
break;
@@ -3690,10 +3686,8 @@ x86_emulate(
case 4: /* fldenv - TODO */
goto cannot_emulate;
case 5: /* fldcw m2byte */
- ea.bytes = 2;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 2, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldcw", src.val);
break;
@@ -3727,10 +3721,8 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
- ea.bytes = 4;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 4, ctxt)) != X86EMUL_OKAY )
goto done;
switch ( modrm_reg & 7 )
{
@@ -3788,10 +3780,8 @@ x86_emulate(
switch ( modrm_reg & 7 )
{
case 0: /* fild m32i */
- ea.bytes = 4;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 4, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fildl", src.val);
break;
@@ -3815,10 +3805,8 @@ x86_emulate(
emulate_fpu_insn_memdst("fistpl", dst.val);
break;
case 5: /* fld m80fp */
- ea.bytes = 10;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off,
- &src.val, src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 10, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldt", src.val);
break;
@@ -3850,10 +3838,8 @@ x86_emulate(
break;
default:
ASSERT(ea.type == OP_MEM);
- ea.bytes = 8;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 8, ctxt)) != X86EMUL_OKAY )
goto done;
switch ( modrm_reg & 7 )
{
@@ -3902,10 +3888,8 @@ x86_emulate(
switch ( modrm_reg & 7 )
{
case 0: /* fld m64fp */;
- ea.bytes = 8;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 8, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldl", src.val);
break;
@@ -3959,11 +3943,6 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
- ea.bytes = 2;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
- goto done;
switch ( modrm_reg & 7 )
{
case 0: /* fiadd m16i */
@@ -4021,10 +4000,8 @@ x86_emulate(
switch ( modrm_reg & 7 )
{
case 0: /* fild m16i */
- ea.bytes = 2;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 2, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("filds", src.val);
break;
@@ -4048,18 +4025,14 @@ x86_emulate(
emulate_fpu_insn_memdst("fistps", dst.val);
break;
case 4: /* fbld m80dec */
- ea.bytes = 10;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off,
- &src.val, src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 10, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fbld", src.val);
break;
case 5: /* fild m64i */
- ea.bytes = 8;
- src = ea;
- if ( (rc = ops->read(src.mem.seg, src.mem.off, &src.val,
- src.bytes, ctxt)) != 0 )
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ 8, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fildll", src.val);
break;
[-- Attachment #3: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 3/6] x86emul: simplify FPU destination operand handling
2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
2016-12-08 11:36 ` [PATCH v2 2/6] x86emul: simplify FPU source operand handling Jan Beulich
@ 2016-12-08 11:36 ` Jan Beulich
2016-12-08 11:37 ` [PATCH v2 4/6] x86emul: reduce FPU handling code size Jan Beulich
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:36 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
[-- Attachment #1: Type: text/plain, Size: 8558 bytes --]
Consolidate the copying of ea to dst: There's no need to set the type
to OP_MEM, and instead the load cases setting it to OP_NONE allows the
copying to be done just once per major opcode.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3663,6 +3663,7 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
+ dst = ea;
switch ( modrm_reg & 7 )
{
case 0: /* fld m32fp */
@@ -3670,18 +3671,15 @@ x86_emulate(
4, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("flds", src.val);
+ dst.type = OP_NONE;
break;
case 2: /* fstp m32fp */
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fsts", dst.val);
+ dst.bytes = 4;
break;
case 3: /* fstp m32fp */
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstps", dst.val);
+ dst.bytes = 4;
break;
case 4: /* fldenv - TODO */
goto cannot_emulate;
@@ -3690,14 +3688,13 @@ x86_emulate(
2, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldcw", src.val);
+ dst.type = OP_NONE;
break;
case 6: /* fnstenv - TODO */
goto cannot_emulate;
case 7: /* fnstcw m2byte */
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fnstcw", dst.val);
+ dst.bytes = 2;
break;
default:
generate_exception(EXC_UD);
@@ -3777,6 +3774,7 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
+ dst = ea;
switch ( modrm_reg & 7 )
{
case 0: /* fild m32i */
@@ -3784,37 +3782,31 @@ x86_emulate(
4, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fildl", src.val);
+ dst.type = OP_NONE;
break;
case 1: /* fisttp m32i */
host_and_vcpu_must_have(sse3);
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fisttpl", dst.val);
+ dst.bytes = 4;
break;
case 2: /* fist m32i */
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fistl", dst.val);
+ dst.bytes = 4;
break;
case 3: /* fistp m32i */
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fistpl", dst.val);
+ dst.bytes = 4;
break;
case 5: /* fld m80fp */
if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
10, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldt", src.val);
+ dst.type = OP_NONE;
break;
case 7: /* fstp m80fp */
- ea.bytes = 10;
- dst.type = OP_MEM;
- dst = ea;
emulate_fpu_insn_memdst("fstpt", dst.val);
+ dst.bytes = 10;
break;
default:
generate_exception(EXC_UD);
@@ -3885,6 +3877,7 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
+ dst = ea;
switch ( modrm_reg & 7 )
{
case 0: /* fld m64fp */;
@@ -3892,34 +3885,27 @@ x86_emulate(
8, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldl", src.val);
+ dst.type = OP_NONE;
break;
case 1: /* fisttp m64i */
host_and_vcpu_must_have(sse3);
- ea.bytes = 8;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fisttpll", dst.val);
+ dst.bytes = 8;
break;
case 2: /* fst m64fp */
- ea.bytes = 8;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstl", dst.val);
+ dst.bytes = 8;
break;
case 3: /* fstp m64fp */
- ea.bytes = 8;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstpl", dst.val);
+ dst.bytes = 8;
break;
case 4: /* frstor - TODO */
case 6: /* fnsave - TODO */
goto cannot_emulate;
case 7: /* fnstsw m2byte */
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fnstsw", dst.val);
+ dst.bytes = 2;
break;
default:
generate_exception(EXC_UD);
@@ -3997,6 +3983,7 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
+ dst = ea;
switch ( modrm_reg & 7 )
{
case 0: /* fild m16i */
@@ -4004,49 +3991,42 @@ x86_emulate(
2, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("filds", src.val);
+ dst.type = OP_NONE;
break;
case 1: /* fisttp m16i */
host_and_vcpu_must_have(sse3);
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fisttps", dst.val);
+ dst.bytes = 2;
break;
case 2: /* fist m16i */
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fists", dst.val);
+ dst.bytes = 2;
break;
case 3: /* fistp m16i */
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fistps", dst.val);
+ dst.bytes = 2;
break;
case 4: /* fbld m80dec */
if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
10, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fbld", src.val);
+ dst.type = OP_NONE;
break;
case 5: /* fild m64i */
if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
8, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fildll", src.val);
+ dst.type = OP_NONE;
break;
case 6: /* fbstp packed bcd */
- ea.bytes = 10;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fbstp", dst.val);
+ dst.bytes = 10;
break;
case 7: /* fistp m64i */
- ea.bytes = 8;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fistpll", dst.val);
+ dst.bytes = 8;
break;
}
}
[-- Attachment #2: x86emul-FPU-simplify-dst.patch --]
[-- Type: text/plain, Size: 8608 bytes --]
x86emul: simplify FPU destination operand handling
Consolidate the copying of ea to dst: There's no need to set the type
to OP_MEM, and instead the load cases setting it to OP_NONE allows the
copying to be done just once per major opcode.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3663,6 +3663,7 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
+ dst = ea;
switch ( modrm_reg & 7 )
{
case 0: /* fld m32fp */
@@ -3670,18 +3671,15 @@ x86_emulate(
4, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("flds", src.val);
+ dst.type = OP_NONE;
break;
case 2: /* fstp m32fp */
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fsts", dst.val);
+ dst.bytes = 4;
break;
case 3: /* fstp m32fp */
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstps", dst.val);
+ dst.bytes = 4;
break;
case 4: /* fldenv - TODO */
goto cannot_emulate;
@@ -3690,14 +3688,13 @@ x86_emulate(
2, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldcw", src.val);
+ dst.type = OP_NONE;
break;
case 6: /* fnstenv - TODO */
goto cannot_emulate;
case 7: /* fnstcw m2byte */
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fnstcw", dst.val);
+ dst.bytes = 2;
break;
default:
generate_exception(EXC_UD);
@@ -3777,6 +3774,7 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
+ dst = ea;
switch ( modrm_reg & 7 )
{
case 0: /* fild m32i */
@@ -3784,37 +3782,31 @@ x86_emulate(
4, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fildl", src.val);
+ dst.type = OP_NONE;
break;
case 1: /* fisttp m32i */
host_and_vcpu_must_have(sse3);
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fisttpl", dst.val);
+ dst.bytes = 4;
break;
case 2: /* fist m32i */
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fistl", dst.val);
+ dst.bytes = 4;
break;
case 3: /* fistp m32i */
- ea.bytes = 4;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fistpl", dst.val);
+ dst.bytes = 4;
break;
case 5: /* fld m80fp */
if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
10, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldt", src.val);
+ dst.type = OP_NONE;
break;
case 7: /* fstp m80fp */
- ea.bytes = 10;
- dst.type = OP_MEM;
- dst = ea;
emulate_fpu_insn_memdst("fstpt", dst.val);
+ dst.bytes = 10;
break;
default:
generate_exception(EXC_UD);
@@ -3885,6 +3877,7 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
+ dst = ea;
switch ( modrm_reg & 7 )
{
case 0: /* fld m64fp */;
@@ -3892,34 +3885,27 @@ x86_emulate(
8, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fldl", src.val);
+ dst.type = OP_NONE;
break;
case 1: /* fisttp m64i */
host_and_vcpu_must_have(sse3);
- ea.bytes = 8;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fisttpll", dst.val);
+ dst.bytes = 8;
break;
case 2: /* fst m64fp */
- ea.bytes = 8;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstl", dst.val);
+ dst.bytes = 8;
break;
case 3: /* fstp m64fp */
- ea.bytes = 8;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fstpl", dst.val);
+ dst.bytes = 8;
break;
case 4: /* frstor - TODO */
case 6: /* fnsave - TODO */
goto cannot_emulate;
case 7: /* fnstsw m2byte */
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fnstsw", dst.val);
+ dst.bytes = 2;
break;
default:
generate_exception(EXC_UD);
@@ -3997,6 +3983,7 @@ x86_emulate(
break;
default:
generate_exception_if(ea.type != OP_MEM, EXC_UD);
+ dst = ea;
switch ( modrm_reg & 7 )
{
case 0: /* fild m16i */
@@ -4004,49 +3991,42 @@ x86_emulate(
2, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("filds", src.val);
+ dst.type = OP_NONE;
break;
case 1: /* fisttp m16i */
host_and_vcpu_must_have(sse3);
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fisttps", dst.val);
+ dst.bytes = 2;
break;
case 2: /* fist m16i */
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fists", dst.val);
+ dst.bytes = 2;
break;
case 3: /* fistp m16i */
- ea.bytes = 2;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fistps", dst.val);
+ dst.bytes = 2;
break;
case 4: /* fbld m80dec */
if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
10, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fbld", src.val);
+ dst.type = OP_NONE;
break;
case 5: /* fild m64i */
if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
8, ctxt)) != X86EMUL_OKAY )
goto done;
emulate_fpu_insn_memsrc("fildll", src.val);
+ dst.type = OP_NONE;
break;
case 6: /* fbstp packed bcd */
- ea.bytes = 10;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fbstp", dst.val);
+ dst.bytes = 10;
break;
case 7: /* fistp m64i */
- ea.bytes = 8;
- dst = ea;
- dst.type = OP_MEM;
emulate_fpu_insn_memdst("fistpll", dst.val);
+ dst.bytes = 8;
break;
}
}
[-- Attachment #3: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 4/6] x86emul: reduce FPU handling code size
2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
` (2 preceding siblings ...)
2016-12-08 11:36 ` [PATCH v2 3/6] x86emul: simplify FPU destination " Jan Beulich
@ 2016-12-08 11:37 ` Jan Beulich
2016-12-08 11:38 ` [PATCH v2 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
2016-12-08 11:38 ` [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
5 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:37 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
[-- Attachment #1: Type: text/plain, Size: 8822 bytes --]
Pulling out the {get,put}_fpu() invocations from individual emulation
paths leads to a couple of kb code size reduction in my builds. Note
that this is fine exception-wise:
- #UD and #NM have implementation defined order relative to one
another,
- data read #GP/#SS/#PF now properly are delivered after #NM/#UD.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -849,59 +849,44 @@ do {
} while (0)
#define emulate_fpu_insn(_op) \
-do{ struct fpu_insn_ctxt fic; \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " \n" \
"2: \n" \
- : "=m" (fic.insn_bytes) : : "memory" ); \
- put_fpu(&fic); \
-} while (0)
+ : "=m" (fic.insn_bytes) : : "memory" )
#define emulate_fpu_insn_memdst(_op, _arg) \
-do{ struct fpu_insn_ctxt fic; \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " %1 \n" \
"2: \n" \
: "=m" (fic.insn_bytes), "=m" (_arg) \
- : : "memory" ); \
- put_fpu(&fic); \
-} while (0)
+ : : "memory" )
#define emulate_fpu_insn_memsrc(_op, _arg) \
-do{ struct fpu_insn_ctxt fic; \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " %1 \n" \
"2: \n" \
: "=m" (fic.insn_bytes) \
- : "m" (_arg) : "memory" ); \
- put_fpu(&fic); \
-} while (0)
+ : "m" (_arg) : "memory" )
#define emulate_fpu_insn_stub(_bytes...) \
do { \
uint8_t *buf = get_stub(stub); \
unsigned int _nr = sizeof((uint8_t[]){ _bytes }); \
- struct fpu_insn_ctxt fic = { .insn_bytes = _nr }; \
+ fic.insn_bytes = _nr; \
memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1); \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
stub.func(); \
- put_fpu(&fic); \
put_stub(stub); \
} while (0)
#define emulate_fpu_insn_stub_eflags(bytes...) \
do { \
unsigned int nr_ = sizeof((uint8_t[]){ bytes }); \
- struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ }; \
unsigned long tmp_; \
+ fic.insn_bytes = nr_; \
memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1); \
- get_fpu(X86EMUL_FPU_fpu, &fic_); \
asm volatile ( _PRE_EFLAGS("[eflags]", "[mask]", "[tmp]") \
"call *%[func];" \
_POST_EFLAGS("[eflags]", "[mask]", "[tmp]") \
@@ -909,7 +894,6 @@ do {
[tmp] "=&r" (tmp_) \
: [func] "rm" (stub.func), \
[mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) ); \
- put_fpu(&fic_); \
put_stub(stub); \
} while (0)
@@ -2492,6 +2476,7 @@ x86_emulate(
struct operand src = { .reg = PTR_POISON };
struct operand dst = { .reg = PTR_POISON };
enum x86_swint_type swint_type;
+ struct fpu_insn_ctxt fic;
struct x86_emulate_stub stub = {};
DECLARE_ALIGNED(mmval_t, mmval);
@@ -3188,15 +3173,12 @@ x86_emulate(
break;
case 0x9b: /* wait/fwait */
- {
- struct fpu_insn_ctxt fic = { .insn_bytes = 1 };
-
+ fic.insn_bytes = 1;
host_and_vcpu_must_have(fpu);
get_fpu(X86EMUL_FPU_wait, &fic);
asm volatile ( "fwait" ::: "memory" );
put_fpu(&fic);
break;
- }
case 0x9c: /* pushf */
generate_exception_if((_regs.eflags & EFLG_VM) &&
@@ -3575,6 +3557,7 @@ x86_emulate(
case 0xd8: /* FPU 0xd8 */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* fadd %stN,%st */
@@ -3620,10 +3603,12 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xd9: /* FPU 0xd9 */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xfb: /* fsincos */
@@ -3700,10 +3685,12 @@ x86_emulate(
generate_exception(EXC_UD);
}
}
+ put_fpu(&fic);
break;
case 0xda: /* FPU 0xda */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* fcmovb %stN */
@@ -3749,10 +3736,12 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xdb: /* FPU 0xdb */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* fcmovnb %stN */
@@ -3812,10 +3801,12 @@ x86_emulate(
generate_exception(EXC_UD);
}
}
+ put_fpu(&fic);
break;
case 0xdc: /* FPU 0xdc */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* fadd %st,%stN */
@@ -3861,10 +3852,12 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xdd: /* FPU 0xdd */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* ffree %stN */
@@ -3911,10 +3904,12 @@ x86_emulate(
generate_exception(EXC_UD);
}
}
+ put_fpu(&fic);
break;
case 0xde: /* FPU 0xde */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* faddp %stN */
@@ -3957,10 +3952,12 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xdf: /* FPU 0xdf */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xe0:
@@ -4030,6 +4027,7 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
@@ -4719,8 +4717,8 @@ x86_emulate(
case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* vmovsd xmm,xmm/m64 */
{
uint8_t *buf = get_stub(stub);
- struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
+ fic.insn_bytes = 5;
buf[0] = 0x3e;
buf[1] = 0x3e;
buf[2] = 0x0f;
@@ -4987,8 +4985,8 @@ x86_emulate(
case X86EMUL_OPC_VEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */
{
uint8_t *buf = get_stub(stub);
- struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
+ fic.insn_bytes = 5;
buf[0] = 0x3e;
buf[1] = 0x3e;
buf[2] = 0x0f;
[-- Attachment #2: x86emul-FPU-reduce.patch --]
[-- Type: text/plain, Size: 8860 bytes --]
x86emul: reduce FPU handling code size
Pulling out the {get,put}_fpu() invocations from individual emulation
paths leads to a couple of kb code size reduction in my builds. Note
that this is fine exception-wise:
- #UD and #NM have implementation defined order relative to one
another,
- data read #GP/#SS/#PF now properly are delivered after #NM/#UD.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -849,59 +849,44 @@ do {
} while (0)
#define emulate_fpu_insn(_op) \
-do{ struct fpu_insn_ctxt fic; \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " \n" \
"2: \n" \
- : "=m" (fic.insn_bytes) : : "memory" ); \
- put_fpu(&fic); \
-} while (0)
+ : "=m" (fic.insn_bytes) : : "memory" )
#define emulate_fpu_insn_memdst(_op, _arg) \
-do{ struct fpu_insn_ctxt fic; \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " %1 \n" \
"2: \n" \
: "=m" (fic.insn_bytes), "=m" (_arg) \
- : : "memory" ); \
- put_fpu(&fic); \
-} while (0)
+ : : "memory" )
#define emulate_fpu_insn_memsrc(_op, _arg) \
-do{ struct fpu_insn_ctxt fic; \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " %1 \n" \
"2: \n" \
: "=m" (fic.insn_bytes) \
- : "m" (_arg) : "memory" ); \
- put_fpu(&fic); \
-} while (0)
+ : "m" (_arg) : "memory" )
#define emulate_fpu_insn_stub(_bytes...) \
do { \
uint8_t *buf = get_stub(stub); \
unsigned int _nr = sizeof((uint8_t[]){ _bytes }); \
- struct fpu_insn_ctxt fic = { .insn_bytes = _nr }; \
+ fic.insn_bytes = _nr; \
memcpy(buf, ((uint8_t[]){ _bytes, 0xc3 }), _nr + 1); \
- get_fpu(X86EMUL_FPU_fpu, &fic); \
stub.func(); \
- put_fpu(&fic); \
put_stub(stub); \
} while (0)
#define emulate_fpu_insn_stub_eflags(bytes...) \
do { \
unsigned int nr_ = sizeof((uint8_t[]){ bytes }); \
- struct fpu_insn_ctxt fic_ = { .insn_bytes = nr_ }; \
unsigned long tmp_; \
+ fic.insn_bytes = nr_; \
memcpy(get_stub(stub), ((uint8_t[]){ bytes, 0xc3 }), nr_ + 1); \
- get_fpu(X86EMUL_FPU_fpu, &fic_); \
asm volatile ( _PRE_EFLAGS("[eflags]", "[mask]", "[tmp]") \
"call *%[func];" \
_POST_EFLAGS("[eflags]", "[mask]", "[tmp]") \
@@ -909,7 +894,6 @@ do {
[tmp] "=&r" (tmp_) \
: [func] "rm" (stub.func), \
[mask] "i" (EFLG_ZF|EFLG_PF|EFLG_CF) ); \
- put_fpu(&fic_); \
put_stub(stub); \
} while (0)
@@ -2492,6 +2476,7 @@ x86_emulate(
struct operand src = { .reg = PTR_POISON };
struct operand dst = { .reg = PTR_POISON };
enum x86_swint_type swint_type;
+ struct fpu_insn_ctxt fic;
struct x86_emulate_stub stub = {};
DECLARE_ALIGNED(mmval_t, mmval);
@@ -3188,15 +3173,12 @@ x86_emulate(
break;
case 0x9b: /* wait/fwait */
- {
- struct fpu_insn_ctxt fic = { .insn_bytes = 1 };
-
+ fic.insn_bytes = 1;
host_and_vcpu_must_have(fpu);
get_fpu(X86EMUL_FPU_wait, &fic);
asm volatile ( "fwait" ::: "memory" );
put_fpu(&fic);
break;
- }
case 0x9c: /* pushf */
generate_exception_if((_regs.eflags & EFLG_VM) &&
@@ -3575,6 +3557,7 @@ x86_emulate(
case 0xd8: /* FPU 0xd8 */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* fadd %stN,%st */
@@ -3620,10 +3603,12 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xd9: /* FPU 0xd9 */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xfb: /* fsincos */
@@ -3700,10 +3685,12 @@ x86_emulate(
generate_exception(EXC_UD);
}
}
+ put_fpu(&fic);
break;
case 0xda: /* FPU 0xda */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* fcmovb %stN */
@@ -3749,10 +3736,12 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xdb: /* FPU 0xdb */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* fcmovnb %stN */
@@ -3812,10 +3801,12 @@ x86_emulate(
generate_exception(EXC_UD);
}
}
+ put_fpu(&fic);
break;
case 0xdc: /* FPU 0xdc */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* fadd %st,%stN */
@@ -3861,10 +3852,12 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xdd: /* FPU 0xdd */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* ffree %stN */
@@ -3911,10 +3904,12 @@ x86_emulate(
generate_exception(EXC_UD);
}
}
+ put_fpu(&fic);
break;
case 0xde: /* FPU 0xde */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xc0 ... 0xc7: /* faddp %stN */
@@ -3957,10 +3952,12 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xdf: /* FPU 0xdf */
host_and_vcpu_must_have(fpu);
+ get_fpu(X86EMUL_FPU_fpu, &fic);
switch ( modrm )
{
case 0xe0:
@@ -4030,6 +4027,7 @@ x86_emulate(
break;
}
}
+ put_fpu(&fic);
break;
case 0xe0 ... 0xe2: /* loop{,z,nz} */ {
@@ -4719,8 +4717,8 @@ x86_emulate(
case X86EMUL_OPC_VEX_F2(0x0f, 0x11): /* vmovsd xmm,xmm/m64 */
{
uint8_t *buf = get_stub(stub);
- struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
+ fic.insn_bytes = 5;
buf[0] = 0x3e;
buf[1] = 0x3e;
buf[2] = 0x0f;
@@ -4987,8 +4985,8 @@ x86_emulate(
case X86EMUL_OPC_VEX_66(0x0f, 0xd6): /* vmovq xmm,xmm/m64 */
{
uint8_t *buf = get_stub(stub);
- struct fpu_insn_ctxt fic = { .insn_bytes = 5 };
+ fic.insn_bytes = 5;
buf[0] = 0x3e;
buf[1] = 0x3e;
buf[2] = 0x0f;
[-- Attachment #3: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands
2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
` (3 preceding siblings ...)
2016-12-08 11:37 ` [PATCH v2 4/6] x86emul: reduce FPU handling code size Jan Beulich
@ 2016-12-08 11:38 ` Jan Beulich
2016-12-08 11:38 ` [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
5 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:38 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
[-- Attachment #1: Type: text/plain, Size: 3091 bytes --]
Accessing an 8-byte (or perhaps just 4-byte in the test harness when
built as 32-bit app) field to read/write 10 bytes (leveraging the
successive field) is a latent bug, as the compiler could copy things
around. Use the 32 bytes large SSE/AVX slot instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.
---
The presence of the !op->write checks implies a logical (but not
functional) dependency on the patch making ops->write (and ->cmpxchg)
optional. Without that patch they're just dead code.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3787,15 +3787,19 @@ x86_emulate(
dst.bytes = 4;
break;
case 5: /* fld m80fp */
- if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
10, ctxt)) != X86EMUL_OKAY )
goto done;
- emulate_fpu_insn_memsrc("fldt", src.val);
+ emulate_fpu_insn_memsrc("fldt", *mmvalp);
dst.type = OP_NONE;
break;
case 7: /* fstp m80fp */
- emulate_fpu_insn_memdst("fstpt", dst.val);
- dst.bytes = 10;
+ fail_if(!ops->write);
+ emulate_fpu_insn_memdst("fstpt", *mmvalp);
+ if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+ 10, ctxt)) != X86EMUL_OKAY )
+ goto done;
+ dst.type = OP_NONE;
break;
default:
generate_exception(EXC_UD);
@@ -4004,10 +4008,10 @@ x86_emulate(
dst.bytes = 2;
break;
case 4: /* fbld m80dec */
- if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
10, ctxt)) != X86EMUL_OKAY )
goto done;
- emulate_fpu_insn_memsrc("fbld", src.val);
+ emulate_fpu_insn_memsrc("fbld", *mmvalp);
dst.type = OP_NONE;
break;
case 5: /* fild m64i */
@@ -4018,8 +4022,12 @@ x86_emulate(
dst.type = OP_NONE;
break;
case 6: /* fbstp packed bcd */
- emulate_fpu_insn_memdst("fbstp", dst.val);
- dst.bytes = 10;
+ fail_if(!ops->write);
+ emulate_fpu_insn_memdst("fbstp", *mmvalp);
+ if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+ 10, ctxt)) != X86EMUL_OKAY )
+ goto done;
+ dst.type = OP_NONE;
break;
case 7: /* fistp m64i */
emulate_fpu_insn_memdst("fistpll", dst.val);
[-- Attachment #2: x86emul-FPU-10byte.patch --]
[-- Type: text/plain, Size: 3161 bytes --]
x86emul: avoid undefined behavior when dealing with 10-byte FPU operands
Accessing an 8-byte (or perhaps just 4-byte in the test harness when
built as 32-bit app) field to read/write 10 bytes (leveraging the
successive field) is a latent bug, as the compiler could copy things
around. Use the 32 bytes large SSE/AVX slot instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Re-base over changes to patch 1.
---
The presence of the !op->write checks implies a logical (but not
functional) dependency on the patch making ops->write (and ->cmpxchg)
optional. Without that patch they're just dead code.
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -3787,15 +3787,19 @@ x86_emulate(
dst.bytes = 4;
break;
case 5: /* fld m80fp */
- if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
10, ctxt)) != X86EMUL_OKAY )
goto done;
- emulate_fpu_insn_memsrc("fldt", src.val);
+ emulate_fpu_insn_memsrc("fldt", *mmvalp);
dst.type = OP_NONE;
break;
case 7: /* fstp m80fp */
- emulate_fpu_insn_memdst("fstpt", dst.val);
- dst.bytes = 10;
+ fail_if(!ops->write);
+ emulate_fpu_insn_memdst("fstpt", *mmvalp);
+ if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+ 10, ctxt)) != X86EMUL_OKAY )
+ goto done;
+ dst.type = OP_NONE;
break;
default:
generate_exception(EXC_UD);
@@ -4004,10 +4008,10 @@ x86_emulate(
dst.bytes = 2;
break;
case 4: /* fbld m80dec */
- if ( (rc = ops->read(ea.mem.seg, ea.mem.off, &src.val,
+ if ( (rc = ops->read(ea.mem.seg, ea.mem.off, mmvalp,
10, ctxt)) != X86EMUL_OKAY )
goto done;
- emulate_fpu_insn_memsrc("fbld", src.val);
+ emulate_fpu_insn_memsrc("fbld", *mmvalp);
dst.type = OP_NONE;
break;
case 5: /* fild m64i */
@@ -4018,8 +4022,12 @@ x86_emulate(
dst.type = OP_NONE;
break;
case 6: /* fbstp packed bcd */
- emulate_fpu_insn_memdst("fbstp", dst.val);
- dst.bytes = 10;
+ fail_if(!ops->write);
+ emulate_fpu_insn_memdst("fbstp", *mmvalp);
+ if ( (rc = ops->write(ea.mem.seg, ea.mem.off, mmvalp,
+ 10, ctxt)) != X86EMUL_OKAY )
+ goto done;
+ dst.type = OP_NONE;
break;
case 7: /* fistp m64i */
emulate_fpu_insn_memdst("fistpll", dst.val);
[-- Attachment #3: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints
2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
` (4 preceding siblings ...)
2016-12-08 11:38 ` [PATCH v2 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
@ 2016-12-08 11:38 ` Jan Beulich
2016-12-15 9:52 ` Ping: " Jan Beulich
5 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-12-08 11:38 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
[-- Attachment #1: Type: text/plain, Size: 1856 bytes --]
The memory clobber is rather harsh here. However, fic.exn_raised may be
modified as a side effect, so we need to let the compiler know that all
of "fic" may be changed (preventing it from moving around accesses to
the exn_raised field).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -759,7 +759,7 @@ do {
})
struct fpu_insn_ctxt {
- uint8_t insn_bytes;
+ uint8_t insn_bytes; /* Must be first! */
int8_t exn_raised;
};
@@ -853,23 +853,21 @@ do {
"movb $2f-1f,%0 \n" \
"1: " _op " \n" \
"2: \n" \
- : "=m" (fic.insn_bytes) : : "memory" )
+ : "+m" (fic) )
#define emulate_fpu_insn_memdst(_op, _arg) \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " %1 \n" \
"2: \n" \
- : "=m" (fic.insn_bytes), "=m" (_arg) \
- : : "memory" )
+ : "+m" (fic), "=m" (_arg) )
#define emulate_fpu_insn_memsrc(_op, _arg) \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " %1 \n" \
"2: \n" \
- : "=m" (fic.insn_bytes) \
- : "m" (_arg) : "memory" )
+ : "+m" (fic) : "m" (_arg) )
#define emulate_fpu_insn_stub(_bytes...) \
do { \
[-- Attachment #2: x86emul-FPU-constraints.patch --]
[-- Type: text/plain, Size: 1902 bytes --]
x86emul: simplify FPU handling asm() constraints
The memory clobber is rather harsh here. However, fic.exn_raised may be
modified as a side effect, so we need to let the compiler know that all
of "fic" may be changed (preventing it from moving around accesses to
the exn_raised field).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
--- a/xen/arch/x86/x86_emulate/x86_emulate.c
+++ b/xen/arch/x86/x86_emulate/x86_emulate.c
@@ -759,7 +759,7 @@ do {
})
struct fpu_insn_ctxt {
- uint8_t insn_bytes;
+ uint8_t insn_bytes; /* Must be first! */
int8_t exn_raised;
};
@@ -853,23 +853,21 @@ do {
"movb $2f-1f,%0 \n" \
"1: " _op " \n" \
"2: \n" \
- : "=m" (fic.insn_bytes) : : "memory" )
+ : "+m" (fic) )
#define emulate_fpu_insn_memdst(_op, _arg) \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " %1 \n" \
"2: \n" \
- : "=m" (fic.insn_bytes), "=m" (_arg) \
- : : "memory" )
+ : "+m" (fic), "=m" (_arg) )
#define emulate_fpu_insn_memsrc(_op, _arg) \
asm volatile ( \
"movb $2f-1f,%0 \n" \
"1: " _op " %1 \n" \
"2: \n" \
- : "=m" (fic.insn_bytes) \
- : "m" (_arg) : "memory" )
+ : "+m" (fic) : "m" (_arg) )
#define emulate_fpu_insn_stub(_bytes...) \
do { \
[-- Attachment #3: Type: text/plain, Size: 127 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes
2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
@ 2016-12-08 11:42 ` Andrew Cooper
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2016-12-08 11:42 UTC (permalink / raw)
To: Jan Beulich, xen-devel
On 08/12/16 11:35, Jan Beulich wrote:
> First of all there are a number of secondary encodings both Intel and
> AMD support, but which aren't formally documented. See e.g.
> www.sandpile.org/x86/opc_fpu.htm for inofficial documentation.
>
> Next there are a few more no-ops - instructions which served a purpose
> only on 8087 or 287.
>
> Further switch from fail_if() to raising of #UD in a couple of places
> (as the decoding of FPU opcodes should now be complete except where
> explicitly marked as todo).
>
> Also adjust a few comments.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Ping: [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints
2016-12-08 11:38 ` [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
@ 2016-12-15 9:52 ` Jan Beulich
2016-12-15 11:36 ` Andrew Cooper
0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2016-12-15 9:52 UTC (permalink / raw)
To: xen-devel; +Cc: Andrew Cooper
>>> On 08.12.16 at 12:38, <JBeulich@suse.com> wrote:
> The memory clobber is rather harsh here. However, fic.exn_raised may be
> modified as a side effect, so we need to let the compiler know that all
> of "fic" may be changed (preventing it from moving around accesses to
> the exn_raised field).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Ping? (The v1 discussion had stalled, so at that point it seemed best
to simply re-send with the updates to the earlier patches.)
> --- a/xen/arch/x86/x86_emulate/x86_emulate.c
> +++ b/xen/arch/x86/x86_emulate/x86_emulate.c
> @@ -759,7 +759,7 @@ do {
> })
>
> struct fpu_insn_ctxt {
> - uint8_t insn_bytes;
> + uint8_t insn_bytes; /* Must be first! */
> int8_t exn_raised;
> };
>
> @@ -853,23 +853,21 @@ do {
> "movb $2f-1f,%0 \n" \
> "1: " _op " \n" \
> "2: \n" \
> - : "=m" (fic.insn_bytes) : : "memory" )
> + : "+m" (fic) )
>
> #define emulate_fpu_insn_memdst(_op, _arg) \
> asm volatile ( \
> "movb $2f-1f,%0 \n" \
> "1: " _op " %1 \n" \
> "2: \n" \
> - : "=m" (fic.insn_bytes), "=m" (_arg) \
> - : : "memory" )
> + : "+m" (fic), "=m" (_arg) )
>
> #define emulate_fpu_insn_memsrc(_op, _arg) \
> asm volatile ( \
> "movb $2f-1f,%0 \n" \
> "1: " _op " %1 \n" \
> "2: \n" \
> - : "=m" (fic.insn_bytes) \
> - : "m" (_arg) : "memory" )
> + : "+m" (fic) : "m" (_arg) )
>
> #define emulate_fpu_insn_stub(_bytes...) \
> do { \
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Ping: [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints
2016-12-15 9:52 ` Ping: " Jan Beulich
@ 2016-12-15 11:36 ` Andrew Cooper
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2016-12-15 11:36 UTC (permalink / raw)
To: Jan Beulich, xen-devel
On 15/12/16 09:52, Jan Beulich wrote:
>>>> On 08.12.16 at 12:38, <JBeulich@suse.com> wrote:
>> The memory clobber is rather harsh here. However, fic.exn_raised may be
>> modified as a side effect, so we need to let the compiler know that all
>> of "fic" may be changed (preventing it from moving around accesses to
>> the exn_raised field).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Ping? (The v1 discussion had stalled, so at that point it seemed best
> to simply re-send with the updates to the earlier patches.)
Sorry, yes.
Having considered this further (when considering my other exception
plans), I am now more convinced the code is already latently buggy, and
fixing that latent bug avoids the need for this.
exn_raised can be set at any point between a {get,put}_fpu(), not just
within these stubs. It therefore should already be accessed with a
volatile reference to avoid breaking C's model of the world, as the
memory clobbers here are already not sufficient.
Fixing the volatility of access to exn_raised will allow the safe
dropping of the memory clobbers, without further modifying the memory
operands.
I already have a patch to do similar improvements to the time init code
from my memory barrier series. I will see about getting that posted.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-12-15 11:36 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-08 11:26 [PATCH v2 0/6] x86emul: FPU handling improvements Jan Beulich
2016-12-08 11:35 ` [PATCH v2 1/6] x86emul: extend / amend supported FPU opcodes Jan Beulich
2016-12-08 11:42 ` Andrew Cooper
2016-12-08 11:36 ` [PATCH v2 2/6] x86emul: simplify FPU source operand handling Jan Beulich
2016-12-08 11:36 ` [PATCH v2 3/6] x86emul: simplify FPU destination " Jan Beulich
2016-12-08 11:37 ` [PATCH v2 4/6] x86emul: reduce FPU handling code size Jan Beulich
2016-12-08 11:38 ` [PATCH v2 5/6] x86emul: avoid undefined behavior when dealing with 10-byte FPU operands Jan Beulich
2016-12-08 11:38 ` [PATCH v2 6/6] x86emul: simplify FPU handling asm() constraints Jan Beulich
2016-12-15 9:52 ` Ping: " Jan Beulich
2016-12-15 11:36 ` Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.